OrthoDB - How to retrieve ortholog groups for only the selected species
1
0
Entering edit mode
5.6 years ago
al-ash ▴ 190

I'm using JSON API to retrieve from OrthoDB ortholog cluster ids for which genes are present in the selected species (e.g. Blatella germanica and Zootermopsis nevadensis), but apparently the retrieval is not limited with the argument species=6973%2C136037 which I'm using; rather it retrieves all ortholog cluster ids predicted at the node of "Insecta". I would expect that there is a straightforward way how to limit the ortholog cluster id retrieval to the selected species but I can not figure it out (I'm a novice user).

I was expecting that if I select only some species and then check "Present in all species option" this will do the job, but this option is not returning any hit (http://www.orthodb.org/?singlecopy=1&level=50557&species=6973%2C136037&universal=1) while if I do not limit my retrieval this way and do the filtering of the retrieved ortholog group clusters manually by grepping I do obtain ortholog gene clusters for the selected species.

I'm using the following command to retrieve the ortholog cluster ids:

wget -O retrievedOGs.txt 'http://www.orthodb.org/search?singlecopy=1&limit=40000&level=50557&species=6973%2C136037'


In the subsequent step of retrieval of table of gene annotations by looping through the cluster ids saved in variable ORTHOLOGCLUSTERS:

wget -O - 'http://www.orthodb.org/tab?id='"\$ORTHOLOGCLUSTERS"'&level=50557&species=6973%2C136037' > OGtable.txt


l'm obtaining in the output file (OGtable.txt) lines without any annotation information for those ortholog clusters for which there are no genes in the selected species. I could filter the required information from this table but it is a bit wasteful since I'm wgeting from OrthoDB many ortholog cluster ids without any annotation (which takes also quite some time due to their limitation on maximum downloaded hits per second).

Is there some more efficient way how to limit the retrieval of ortholog cluster ids only to the species of interest? Thanks!

OrthoDB JSON API ortholog • 4.2k views
0
Entering edit mode
3.3 years ago
mreijnders • 0

I know this question is 2.3 years old, but in case anyone stumbles upon this and needs an answer...

You are correct that the filters apply to the whole clade, not just your selected species. This is because they are pre-computed, not computed on the spot. To get what you want you can download the flatfiles and extract the ortholog groups that meet your requirements with some simple scripts. Or, you could use the API to query with a specific profile. In our lab we made some small scripts that do this, amongst others. These can be found at: https://gitlab.com/rmwaterhouse/OrthoDB-API-Scripting

The flatfiles can be found at: https://www.orthodb.org/?page=filelist

0
Entering edit mode

Am I understanding this correctly -- that if you query 'honeybee' -- you'll get OGs for 'insecta' ... so in theory could generate a fasta file for just insecta orthologs?