Retrieve genbank description from a GI number
0
0
Entering edit mode
8.0 years ago
Joe 21k

What's the most straightfoward (and fast) way of retrieving genbank descriptions based on a list of GI numbers?

I've got the blast tabular output for 54k hits, each with a GI number:

2-2129  gi|514995435|gb|KC795686.1| 100.000 136 0   0   115 250 2358    2493    6.78e-62    246
2-2129  gi|514995431|gb|KC795685.1| 100.000 136 0   0   115 250 2358    2493    6.78e-62    246
2-2129  gi|514883433|gb|JX127248.1| 100.000 136 0   0   115 250 97570   97435   6.78e-62    246
2-2129  gi|500229602|gb|KC619528.1| 100.000 136 0   0   115 250 14268   14403   6.78e-62    246
2-2129  gi|493665175|dbj|AP012055.1|    100.000 136 0   0   115 250 64464   64329   6.78e-62    246
2-2129  gi|403311657|gb|JX262235.1| 100.000 136 0   0   115 250 3322    3187    6.78e-62    246
2-2129  gi|403311642|gb|JX262232.1| 100.000 136 0   0   115 250 3343    3208    6.78e-62    246
2-2129  gi|399573441|gb|JX182975.1| 100.000 136 0   0   115 250 33  168 6.78e-62    246
2-2129  gi|394343076|gb|CP003683.1| 100.000 136 0   0   115 250 2064023 2064158 6.78e-62    246
2-2129  gi|384875611|gb|JQ394799.1| 100.000 136 0   0   115 250 2921    2786    6.78e-62    246

But I'm trying to actually identify what they are at a glance (each line corresponds to a unique Blast hit, as I've already done some filtering, collected from collapsed fastqs( - 600,000 unqiue collapsed reads, giving 191million blast hits (95% ID), subsequently collapsed to 54k unique GI hits) .

What I'd ideally like to do is get the genbank descriptions that correspond to each, and write them to the corresponding line in the blast out (or at least write out a new file with the ANI, query ID and so on.

I'd say from googling, this link: Fetching Genbank Entries For List Of Accession Numbers. seems like it's on a similar track, but I'm not sure what the syntax would be to retrieve the descriptions?

blast entrez • 1.9k views
ADD COMMENT
0
Entering edit mode

NCBI has phased out GI numbers as of this month. You should use the accession numbers for retrieving the descriptions.

You can put the accession numbers in a file (one entry per line, sort/unique them to save time) and then use blastdbcmd tool from blast+ package to retrieve the descriptions by following command (you would need blast indexes from NCBI for this to work)

 blastdbcmd -db /path_to/nt -outfmt '%a %t' -entry_batch file_with_acc_numbers -out descriptions_file
ADD REPLY
0
Entering edit mode

Yeah I had heard that, I was hoping i might not be too late.

Nevertheless, I think I've a semi-working solution now. I ran it against a local (and old) nr database, so I suspect that's why it gave me GI's by default. At any rate, I think I'd have needed too many queries to be manageable on the entrez API with their restrictions, but as it was a local DB, the following works: blastdbcmd -db /blastdb/blast/nt -entry_batch uni1_uniquesortedGIs.txt -outfmt '%g %t' -target_only -out unit1_GI_matches.txt

ADD REPLY
0
Entering edit mode

Seems we both posted the solution at the same time!

ADD REPLY

Login before adding your answer.

Traffic: 848 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6