Question: Counting the number of paralogs in various species
gravatar for Pappu
4.5 years ago by
Pappu1.8k wrote:

I am trying to calculate the number of paralogs for a few genes in different species in Ensembl. I am wondering if there is any tool which can do it automatically. Thank you.

ensembl sequence python • 1.7k views
ADD COMMENTlink modified 4.5 years ago by Prakki Rama2.2k • written 4.5 years ago by Pappu1.8k
gravatar for Bert Overduin
4.5 years ago by
Bert Overduin3.6k
Edinburgh Genomics, The University of Edinburgh
Bert Overduin3.6k wrote:

As from your question it looks you like to use Python for this, you could also opt to use the Ensembl REST API.

It's easy to get all orthologues for a gene, for example for the human ABCD1 gene:;format=condensed;type=orthologues

or all paralogues for the same gene:;format=condensed;type=paralogues

So, using these REST statements, I think it should be quite easy for you to start out with a particular gene in e.g. human, find the orthologues in the other Ensembl species and then get the number of paralogues for those genes.

How you can use REST statements in Python code you can find in the REST documentation for the statement in question. 

Hope this helps.





ADD COMMENTlink written 4.5 years ago by Bert Overduin3.6k
gravatar for Vitis
4.5 years ago by
New York
Vitis1.6k wrote:

Have you tried ensembl compara? I think they've used a pipeline to construct gene trees for gene families and called ortholog and paralog. You may use ensembl API to access those information.


ADD COMMENTlink written 4.5 years ago by Vitis1.6k
gravatar for Prakki Rama
4.5 years ago by
Prakki Rama2.2k
Prakki Rama2.2k wrote:

Ensembl REST is very useful but I think, it needs a little parsing to count the paralogs.

Quickly I tried this:

1) Chose my databases in Biomart

2) In the 'Filters', pasted my Gene ID's in 'ID list limit'

3) In the 'Attributes', selected Homologs, Marked 'Ensembl Gene ID' in 'Gene', 'Paralogs'

4) Count and Results

5) Downloaded that results file.

6) Run the following UNIX command:

cut -d " " -f 1 biomart_results.txt | sort | uniq -c

*Limitation: If you have more than 500 ID's, You should run it multiple times.


ADD COMMENTlink modified 4.4 years ago • written 4.5 years ago by Prakki Rama2.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1184 users visited in the last hour