Counting the number of paralogs in various species
3
2
Entering edit mode
9.9 years ago
Pappu ★ 2.1k

I am trying to calculate the number of paralogs for a few genes in different species in Ensembl. I am wondering if there is any tool which can do it automatically. Thank you.

python ensembl sequence • 3.6k views
ADD COMMENT
5
Entering edit mode
9.9 years ago
Bert Overduin ★ 3.7k

As from your question it looks you like to use Python for this, you could also opt to use the Ensembl REST API.

It's easy to get all orthologues for a gene, for example for the human ABCD1 gene:

http://beta.rest.ensembl.org/homology/symbol/human/ABCD1?content-type=application/json;format=condensed;type=orthologues

or all paralogues for the same gene:

http://beta.rest.ensembl.org/homology/symbol/human/ABCD1?content-type=application/json;format=condensed;type=paralogues

So, using these REST statements, I think it should be quite easy for you to start out with a particular gene in e.g. human, find the orthologues in the other Ensembl species and then get the number of paralogues for those genes.

How you can use REST statements in Python code you can find in the REST documentation for the statement in question.

Hope this helps.

ADD COMMENT
3
Entering edit mode
9.9 years ago
Vitis ★ 2.5k

Have you tried ensembl compara? I think they've used a pipeline to construct gene trees for gene families and called ortholog and paralog. You may use ensembl API to access those information.

http://www.ensembl.org/info/genome/compara/homology_method.html

http://www.ensembl.org/info/docs/api/compara/compara_tutorial.html

ADD COMMENT
1
Entering edit mode
9.9 years ago
Prakki Rama ★ 2.7k

Ensembl REST is very useful but I think, it needs a little parsing to count the paralogs.

Quickly I tried this:

  1. Chose my databases in Biomart
  2. In the 'Filters', pasted my Gene ID's in 'ID list limit'
  3. In the 'Attributes', selected Homologs, Marked 'Ensembl Gene ID' in 'Gene', 'Paralogs'
  4. Count and Results
  5. Downloaded that results file.
  6. Run the following UNIX command:

cut -d " " -f 1 biomart_results.txt | sort | uniq -c

Limitation: If you have more than 500 ID's, You should run it multiple times.

ADD COMMENT

Login before adding your answer.

Traffic: 2718 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6