I have a list of about 7000 NCBI gi numbers from one program and I wish to take these into the next program, but this requires the refseq accession numbers i.e. those which start NC_*.
e.g. of what I have: `154350369
154350369 594021901 811154183 407962962 407955691 218540569
What is the best approach?! I have a previous file with the sequence attached to the GI numbers but using blast I only managed to get back the same NCBI accessions (may have been user error?).
Using the following code it is easy to go from GI number to genbank accession (stolen from the docs), but again, still in the genbank format not refseq. I think I have to use the eLink feature and then use eSearch with the search term srcdb_refseq[prop] for the right linked file? (not that I'm sure how to do thins) or would blast be easier? (if I figured how to use the command line version!)
use LWP::Simple; $gi_list = '154350369, 594021901, 811154183, 407962962'; #assemble the URL $base = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/'; $url = $base . "efetch.fcgi?db=nucleotide&id=$gi_list&rettype=acc"; #post the URL $output = get($url); print "$output";
Very confused and a sanity check needed! Thank you!