Question: EMBL-EBI gene ids are not matching with data downloaded from other data bases
22 months ago by
Bioin10 wrote:

Dear Biostars,

I would like to use Sorghum Expression Atlas - E-MTAB-3839 data downloaded from the like Previously I have downloaded Sorghum data from other databases like Phytozome etc, problem is gene ids of EMBL data are different from data downloaded from other sources. Gene id example from other dbs: Sobic.001G000100 Gene id example of ArrayExpress data: SORBI_3003G276100 Is there any way to convert or map EMBL/ArrayExpress gene id to Phytozome Sorghum gene id. Kindly help me to resolve this issue. Thank you.

assembly snp next-gen genome gene • 529 views
22 months ago by Bioin10

The correct observation should be: the IDs from the others are different then those from EBI/EMBL ;) .

The best thing to do is to look in phytozome (?) to see if they offer alternative IDs, otherwise: have a look at the locus_tag info in the EMBL data , that one should (in theory) reflect more the IDs used by other databases

lieven.sterck7.3k

Thanks for your suggestions, unfortunately neither of them worked for Sorghum data.

22 months ago by Bioin10

OK, if you can't find a textual link between the two IDs, you can probably only fall back on creating it yourself.

One approach is: get the CDS fasta file of the annotations, both from EMBL and from phytozome and blastn them to each other and then create a correspondence table for the IDs. This will work in most cases but it's likely not gonna be a 100% waterproof approach.

lieven.sterck7.3k

I will try this approach. Thank you.

22 months ago by Bioin10
22 months ago by
UK, Hinxton, EMBL-EBI
Denise - Open Targets5.1k wrote:

The gene ID in Expression Atlas is the same in Ensembl Plants i.e. SORBI_3003G27610. It seems the gene annotation in Ensembl Plants is provided by phytozome so if the IDs do not match with phytozome, it's worth brining this up to both Ensembl Plants and Phytozome. I had a look at Ensembl Plants BioMart and could not see an option to convert Ensembl Plants (or Array Express IDs) to Phytozome IDs. You can convert them to NCBI IDs if this is of any help. For SORBI_3003G27610, we have 8069790 as Entrez Gene ID.

22 months ago by Denise - Open Targets5.1k
