Hi there,
I'm stuck with a probably simple problem that has nonetheless stole me a lot of time. I still have little experience with this kind of stuff and, although I red similar questions on this and other forums, I didn't manage to solve the issue myself.
I need to do some analyses at both mRNA and protein level on entities linked by paralogy/orthology relationships.
I decided to use the groups identified and classified in EggNOG. I straightforwardly downloaded the protein sequences by using the API but on the site there is no direct way to do the same for the corresponding mRNA (at least to my knowledge). For this reason, I extracted the IDs list that unfortunately is a mix of different notations and it looks like this:
>117187.FVEG_01726T0
>118797.XP_007458112.1
>128390.XP_009472214.1
>12957.ACEP12726-PA
>132113.XP_003494933.1
>132908.ENSPVAP00000005474
For some of them, it should be possible to use the Entrez E-utilities' esearch/efetch but nonetheless I get errors and the sequences I got downloaded did not match any of the IDs I provided in the file. Since I have >4k entities, it is not reasonable to map them by hand. Could you kindly suggest me a way to work this around? Thank you in advance!
Which API are you referring to?
The EggNOG's one as I needed to download the fasta, the alignment, and the tree files. Sorry for not being clearer in the question itself.
Unfortunately the data appears to have come from different sources and EggNOG does not seem to provide a comprehensive list. e.g. FVEG_01726T0 is from Fusarium. You can then find it in Fungi Ensembl. ACEP12726-PA is from Atta cephlotes. I guess there is no easy solution here.
Well that's so unfortunate, thank you anyway for trying helping me. Have a happy New Year!