gene ortholog conversion
2
0
Entering edit mode
4.6 years ago
Learner ▴ 250

I have a basic question which I have tried to find an answer. I have several genes which are from Mouse or Worm. I am trying to find their ortholog. I use Biomart which is based on ensemble annotation. Now I have several questions as follows:

1- How can I valid an ortholog to be a true one ? should I check for a common ancestral gene?

2- They have often similar name across species but they are coming from several chromosome which one should i select if I want to understand them based on DNA and chromosome ?

What do you suggest as a golden standard to be used to convert the ortholog ?

gene RNA-Seq genome • 1.7k views
2
Entering edit mode
4.6 years ago

By definition, orthologs are genes that derive from a common ancestor by speciation. If you use BioMart to retrieve orthologs as inferred by Ensembl Compara then this means these genes have been deemed to derive from a common ancestor. If you want to assess this by yourself, the phylogenetic trees are available on Ensembl gene pages or using the compara API. You also have to understand that ortholog relationships are not necessarily one to one. There's a diagram on the Ensembl compara page that explains the different types of homolog relationships. The golden standard for orthology inference is to do it from a phylogenetic tree since by definition, you have to define a common ancestor. This is what Ensembl is doing.

1
Entering edit mode
4.6 years ago
Rahul Sharma ▴ 650

As mentioned by Jean-Karim, orthologs are genes that derive from a common ancestor by speciation. You could see multiple copies (paralogs) derived from gene duplication events. Although ensembl generate a list of all orthologs, however if you are interested in filtering on certain percentage identity or pick up the most identical orthologs (which is not really recommended), you could use "perc_id" attribute in Biomart and perform filters on basis of that. Here is a sample code which I use for generate a list of orthologs:

mm9_refseq <- read.table("./Ids_contain_non_synonymous_SNPs.ids") # Ids are NCBI gene reference ids
mouse = useMart(biomart = "ENSEMBL_MART_ENSEMBL",dataset="mmusculus_gene_ensembl", host = "aug2017.archive.ensembl.org")
human = useMart(biomart = "ENSEMBL_MART_ENSEMBL",dataset="hsapiens_gene_ensembl", host = "aug2017.archive.ensembl.org")
cow = useMart(biomart = "ENSEMBL_MART_ENSEMBL",dataset="btaurus_gene_ensembl", host = "aug2017.archive.ensembl.org")
rat = useMart(biomart = "ENSEMBL_MART_ENSEMBL",dataset="rnorvegicus_gene_ensembl", host = "aug2017.archive.ensembl.org")
pig = useMart(biomart = "ENSEMBL_MART_ENSEMBL",dataset="sscrofa_gene_ensembl", host = "aug2017.archive.ensembl.org")
chimp = useMart(biomart = "ENSEMBL_MART_ENSEMBL",dataset="ptroglodytes_gene_ensembl", host = "aug2017.archive.ensembl.org")

#getBM(attributes=c("refseq_mrna", "ensembl_peptide_id", "mgi_symbol"), filters = "refseq_mrna", values = refseq_ids, mart= mouse)

attributes = c("external_gene_name", "ensembl_peptide_id", "hsapiens_homolog_ensembl_peptide", "hsapiens_homolog_perc_id",
"rnorvegicus_homolog_ensembl_peptide", "rnorvegicus_homolog_perc_id",
"btaurus_homolog_ensembl_peptide", "btaurus_homolog_perc_id",
"sscrofa_homolog_ensembl_peptide", "sscrofa_homolog_perc_id",
"ptroglodytes_homolog_ensembl_peptide", "ptroglodytes_homolog_perc_id")

Orthologs = getBM(attributes, filters = "refseq_mrna", values = refseq_ids, mart = mouse, uniqueRows=T)

0
Entering edit mode

@Rahul Sharma i think you should change the filters in the last comment and values :-))