Question

Counting the number of paralogs in various species

2

Entering edit mode

11.1 years ago

Pappu ★ 2.1k

I am trying to calculate the number of paralogs for a few genes in different species in Ensembl. I am wondering if there is any tool which can do it automatically. Thank you.

python ensembl sequence • 4.2k views

ADD COMMENT • link updated 11.1 years ago by Prakki Rama ★ 2.7k • written 11.1 years ago by Pappu ★ 2.1k

3

Entering edit mode

11.1 years ago

Vitis ★ 2.6k

Have you tried ensembl compara? I think they've used a pipeline to construct gene trees for gene families and called ortholog and paralog. You may use ensembl API to access those information.

http://www.ensembl.org/info/genome/compara/homology_method.html

http://www.ensembl.org/info/docs/api/compara/compara_tutorial.html

ADD COMMENT • link updated 5.5 years ago by Ram 45k • written 11.1 years ago by Vitis ★ 2.6k

1

Entering edit mode

11.1 years ago

Prakki Rama ★ 2.7k

Ensembl REST is very useful but I think, it needs a little parsing to count the paralogs.

Quickly I tried this:

Chose my databases in Biomart
In the 'Filters', pasted my Gene ID's in 'ID list limit'
In the 'Attributes', selected Homologs, Marked 'Ensembl Gene ID' in 'Gene', 'Paralogs'
Count and Results
Downloaded that results file.
Run the following UNIX command:

cut -d " " -f 1 biomart_results.txt | sort | uniq -c

Limitation: If you have more than 500 ID's, You should run it multiple times.

ADD COMMENT • link updated 5.5 years ago by Ram 45k • written 11.1 years ago by Prakki Rama ★ 2.7k

Ram · Accepted Answer · 2014-06-06

As from your question it looks you like to use Python for this, you could also opt to use the Ensembl REST API.

It's easy to get all orthologues for a gene, for example for the human ABCD1 gene:

http://beta.rest.ensembl.org/homology/symbol/human/ABCD1?content-type=application/json;format=condensed;type=orthologues

or all paralogues for the same gene:

http://beta.rest.ensembl.org/homology/symbol/human/ABCD1?content-type=application/json;format=condensed;type=paralogues

So, using these REST statements, I think it should be quite easy for you to start out with a particular gene in e.g. human, find the orthologues in the other Ensembl species and then get the number of paralogues for those genes.

How you can use REST statements in Python code you can find in the REST documentation for the statement in question.

Hope this helps.