Question: EMBL-EBI gene ids are not matching with data downloaded from other data bases
0
gravatar for Bioin
22 months ago by
Bioin10
Bioin10 wrote:

Dear Biostars,

I would like to use Sorghum Expression Atlas - E-MTAB-3839 data downloaded from the like https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-3839/. Previously I have downloaded Sorghum data from other databases like Phytozome etc, problem is gene ids of EMBL data are different from data downloaded from other sources. Gene id example from other dbs: Sobic.001G000100 Gene id example of ArrayExpress data: SORBI_3003G276100 Is there any way to convert or map EMBL/ArrayExpress gene id to Phytozome Sorghum gene id. Kindly help me to resolve this issue. Thank you.

assembly snp next-gen genome gene • 529 views
ADD COMMENTlink modified 22 months ago by Denise - Open Targets5.1k • written 22 months ago by Bioin10

The correct observation should be: the IDs from the others are different then those from EBI/EMBL ;) .

The best thing to do is to look in phytozome (?) to see if they offer alternative IDs, otherwise: have a look at the locus_tag info in the EMBL data , that one should (in theory) reflect more the IDs used by other databases

ADD REPLYlink written 22 months ago by lieven.sterck7.3k

Thanks for your suggestions, unfortunately neither of them worked for Sorghum data.

ADD REPLYlink modified 22 months ago • written 22 months ago by Bioin10
1

OK, if you can't find a textual link between the two IDs, you can probably only fall back on creating it yourself.

One approach is: get the CDS fasta file of the annotations, both from EMBL and from phytozome and blastn them to each other and then create a correspondence table for the IDs. This will work in most cases but it's likely not gonna be a 100% waterproof approach.

ADD REPLYlink written 22 months ago by lieven.sterck7.3k

I will try this approach. Thank you.

ADD REPLYlink written 22 months ago by Bioin10
1
gravatar for Denise - Open Targets
22 months ago by
UK, Hinxton, EMBL-EBI
Denise - Open Targets5.1k wrote:

The gene ID in Expression Atlas is the same in Ensembl Plants i.e. SORBI_3003G27610. It seems the gene annotation in Ensembl Plants is provided by phytozome so if the IDs do not match with phytozome, it's worth brining this up to both Ensembl Plants and Phytozome. I had a look at Ensembl Plants BioMart and could not see an option to convert Ensembl Plants (or Array Express IDs) to Phytozome IDs. You can convert them to NCBI IDs if this is of any help. For SORBI_3003G27610, we have 8069790 as Entrez Gene ID.

ADD COMMENTlink written 22 months ago by Denise - Open Targets5.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1489 users visited in the last hour