Converting BLAST Alignments (NCBI database) to Gene ID
1
1
Entering edit mode
5.5 years ago
jeremy.cox.2 ▴ 90

Hello All,

This is probably a "newbie" question.

I am trying to take some standard BLAST output and map the alignments to Gene ID's, so that I can do enrichment/network analysis.

Now I am doing something out of the ordinary: I am looking at multiple microorganisms at once.  I think this might be a major difficulty in converting: some databases may not include homologs or hypothetical proteins.  However, I am very new to this problem, having no previous knowledge of Gene ID systems.

Here is my output, blasting against an NCBI database.  (Obviously, I have thousands of lines, this is just a random example.)

queryNAME  gi|367018053|ref|NC_016508.1|   90.20   51      5       0       1       51      788427  788477  1e-09   67.6

 

so I can easily find this in NCBI database

http://www.ncbi.nlm.nih.gov/nuccore/367018053
and then
http://www.ncbi.nlm.nih.gov/gene?cmd=Retrieve&dopt=full_report&list_uids=11505342

I can easily parse all this from NCBI using Edirect
"efetch -db nuccore -id "NC_016508" -mode xml"

 

So, I now have three names:

GI    367018053
ACCESSION NC_016508
Gene symbol   TDEL0H00120

There are many posts about plenty of available Gene ID converters. https://www.biostars.org/p/22/  However, I seem to have a "Catch-22": I don't know what database these ID's belong to, which is ultimately necessary for converting to another system.  (I mean, I generally know what these are, but apparently I need to be very specific in selecting from a big list of possibilities.)  On the other hand, maybe I am being unsuccessful because this is a hypothetical gene, so there is nothing to convert it to in other lists.  

Can anyone offer some guidance on (1) how to convert these successfully and (2) more generally, are there special issues to consider when not using a single organism?

 

BLAST NCBI GENE ID • 3.4k views
ADD COMMENT
0
Entering edit mode

I've used tblastx against RefSeq databases for similar work. Are you using one of these?

ADD REPLY
0
Entering edit mode
5.5 years ago

All these are GenBank identifiers. They are explained here.

ADD COMMENT
0
Entering edit mode

Yes. So for example, I would expect these ID's to convert using the uniprot converter

http://www.uniprot.org/uploadlists/

However, identifying these as "GI number*", "EMBL/GenBank/ DDBJ" returns no results.

ADD REPLY
0
Entering edit mode

It looks like you don't get IDs that correspond/map to proteins. As you point out, your example is a hypothetical gene so it may not be represented by a protein in UniProt.

If you're trying to identify UniProt proteins, why not blastx your nucleotide sequences directly against a UniProt database?

ADD REPLY

Login before adding your answer.

Traffic: 2287 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6