Question: How To Retrieve Genbank Records With Range Of Accession Numbers
6
gravatar for Daniel Standage
8.6 years ago by
Daniel Standage3.9k
Davis, California, USA
Daniel Standage3.9k wrote:

A publication I was reading provided two ranges of GenBank accession numbers for supplementary data.

The ESTs from GR_Ea and GR_Eb were deposited in GenBank under accession nos. CO069431–CO100583 and CO100584–CO132899.]

If I search by a single accession number in GenBank I have no problem pulling up a record, but I obviously don't want to do this for thousands of EST records. Is there a way that I can provide a range of accession numbers (as above) and retrieve all these records simultaneously from GenBank? I am using GenBank's web interface right now, but I also wouldn't mind knowing how to do this on the command line as well.

Thanks!

genbank • 29k views
ADD COMMENTlink modified 2.1 years ago by cmdcolin1.2k • written 8.6 years ago by Daniel Standage3.9k
10
gravatar for Rm
8.6 years ago by
Rm7.9k
Danville, PA
Rm7.9k wrote:

Try this

http://www.ncbi.nlm.nih.gov/nucest?term=CO069431:CO100583[accn]

or can use with list of acc numbers in a file to upload.

NCBI Batch download: http://www.ncbi.nlm.nih.gov/sites/batchentrez?db=Nucleotide

for EST: use db = nucest

http://www.ncbi.nlm.nih.gov/sites/batchentrez?db=Nucest

ADD COMMENTlink modified 8.6 years ago • written 8.6 years ago by Rm7.9k
2

Yet another pearl from the sea of NCBI...

ADD REPLYlink written 8.6 years ago by Khader Shameer18k
1

cool ! I didn't known this 'accn' field !

ADD REPLYlink written 8.6 years ago by Pierre Lindenbaum121k
1

Useful link: How To: Download a large, custom set of records from NCBI: http://www.ncbi.nlm.nih.gov/guide/howto/dwn-records/

ADD REPLYlink written 8.6 years ago by Rm7.9k

Great. This is what I was looking for. The filters are powerful...now I just need a reason to take the time to learn them!

ADD REPLYlink written 8.6 years ago by Daniel Standage3.9k

http://www.ncbi.nlm.nih.gov/entrez/query/static/help/genehelp.html#display_table

ADD REPLYlink written 8.6 years ago by Rm7.9k
2
gravatar for Pierre Lindenbaum
8.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

you could try the following shell script (only your first range here:)

j=69431;
while [ $j -le 100583 ]
do
   acn=`printf "CO%06d" $j`;
   curl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=${acn}&rettype=fasta"
   j=$((j+1))
done

>gi|48738912|gb|CO069431.1|CO069431 GR__Ea26A01.r GR__Ea Gossypium raimondii cDNA clone GR__Ea26A01 3', mRNA sequence
GTGACCAGAGGCTACTTGATGCTAGCCTCTCGAGACCTCAGGCGTGCTAGAGCCGCAGCTCTCAACATCG
TCCCGACTTCACTGGTGCGGCAAAGGCCGTGGCCCTTGTACTCCCTACTCTCAAAGGCAAACTTAACGGC
ATCGCATTGCGTGTACCAACACCAAATGTGTCGGTGGTGGACCTAGTGGTCCAGGTTTCAAAGAAGACGT
TTGCTGAAGAGGTGAACGCTGCTTTCAAAGAGAGTGCAGAGAAAGAGCTACAGGGTATACTTTCAGTGTG
TGAAGAACCCCTCGTTTCAGTGGACTTCAGGTGCTCTGATGTGTCCTCCACCGTTGATGCATCACTCACC
ATGGTCATGGGAGATGACATGGTTAAGGTGATTGCTTGGTATGACAATGAGTGGGGCTACTCTCAAAGGG
TTGTGGATTTGGCTGACATTGTTGCCAATAGCTGGAAGTGATTTCAATGTGCTATACATACATATATGCA
TAACAATGTCACCGATGGTTGATTTTTGCATGCTCACTTCATTTTTATTCTTTCGGCTTCAGCAATTTCT
CATTTTGTCAAGGCTACTATATAATCTGTAATGTAATGTGGGATACATACATTCTCTAATATGCTTATGG
AATAAA

>gi|48738913|gb|CO069432.1|CO069432 GR__Ea26A02.f GR__Ea Gossypium raimondii cDNA clone GR__Ea26A02 5', mRNA sequence
AAAAAAAATTGGCCCTTTTTTTTAAAAAAAAGAGAAAAAGGGTCTTTGCCCCCAAAAAAAAAACCCCCCA
GGAATTTTTTCCCAAAATTCGGGGGACCCCCAAAAATTAAACAGGGAAATTGGCAATTTTACCCCCCCCC
CCCCCCCGGGGGGGGAAATTTAAGGGGAAAAAACCCAAAACAAAAGGGGGGCCCCCGGGTGGGGGGGGGA
CCCAATTCAGGACCCCCCCCCTCGGGGGGTCAAAAACCCGGGTTAAAAAACTTAAGAAACCCCTTTCCCA
GTTTCAGGGAAAATTTCTCCCCCCTTTTCGGGGGCTTCATTGGCTTTTTCAGCAGGGGGAAAGACATTTT
CCCATTCTTCCCTTCCAAAAAAAAACCCCGGCCCAAATTGGGGGGCCCCCCGCACCTGTCAAGGGGGGCA
CCAGGGGGCGGGCCCAGGGTTTCTTTAAAAAAAATGGGCAAAAAGGGGAAAGCTAATCCGGGCCCCCTAA
ACCCAAAAGCTTGTTTCCCTGGCCCCCC
ADD COMMENTlink written 8.6 years ago by Pierre Lindenbaum121k
1
gravatar for cmdcolin
2.1 years ago by
cmdcolin1.2k
United States
cmdcolin1.2k wrote:

You can use ncbi edirect tools (brew install homebrew/science/edirect) and run something like

cat file_with_ids.txt | while read p; do echo $p; esearch -db nucleotide -query $p | efetch -format fasta > $p.fasta; done;

or more simple

cat file_with_ids.txt | while read p; do echo $p; efetch -db nucleotide -id $p -format fasta > $p.fasta; done;

I mention both just because I have seen seen the esearch piped to efetch in ncbi docs elsewhere, but if you have the ID it seems easier to just pipe the ID directly

Note that you might also need to manually install cpan Mozilla::CA since the homebrew doesn't seem to handle that properly

ADD COMMENTlink written 2.1 years ago by cmdcolin1.2k
1

Thanks for the command. It was very helpful!!

ADD REPLYlink written 9 months ago by Prakki Rama2.3k
0
gravatar for Lee Katz
8.6 years ago by
Lee Katz2.9k
Atlanta, GA
Lee Katz2.9k wrote:

Pretty much the same answer as in a previous question, http://biostar.stackexchange.com/questions/3109/downloading-fasta-files/3120#3120

# you could make an array of IDs you need to fetch
use Bio::DB::GenBank;
$gb = Bio::DB::GenBank->new();
$seq = $gb->get_Seq_by_id('MUSIGHBA1'); # Unique ID
@seqCoords=(
  [0, 100],
  [1000-1100]
);
$subseq=$seq->subseq($$seqCoords[0][0],$$seqCoords[0][1]);
# then, look at the blast modules and SearchIO to see how to start blasting and parsing
# http://www.bioperl.org/wiki/HOWTOs
ADD COMMENTlink written 8.6 years ago by Lee Katz2.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1703 users visited in the last hour