retrieve all nt sequences for a taxid list
1
0
Entering edit mode
22 months ago
pe_se ▴ 10

Hi, I have a list of ~2000 taxids and would like to retrieve all available nucleotide sequences of each taxon to build a reference database. With batch entrez I only get an error, even when using only a single taxid or accession number (.txt or .xml). ["An illegal character in a token. Possible wrong file format. Request processing canceled."] Also doesn't work with this perl script -

perl -e 'use LWP::Simple;getstore("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&rettype=fasta&retmode=text&id=".join(",",qw(410645, 410645, ...)),"seqs.fasta");'

with "..." being the list of taxids; but also with only a few IDs it retrieves much fewer sequences than what is available on genbank. Not sure whats wrong there.

Can anyone advice how to compile those (with little to no coding skills...)? Thanks!

ncbi taxid efetch • 631 views
ADD COMMENT
1
Entering edit mode

Two options, none completely trival: either use command-line e-utils in a shell script and loop over all taxids read from a file, or download the whole NT database which you might already have and the NCBI taxonomy and create accession-lists for each taxid to add to pass to BLAST.

ADD REPLY
3
Entering edit mode
22 months ago
GenoMax 141k

You can use Entrezdirect:

$ more id
2104
2093
3256

$ for i in `cat id`; do echo ${i}; esearch -db nuccore -query "${i}[taxID]" ; done
2104
<ENTREZ_DIRECT>
  <Db>nuccore</Db>
  <WebEnv>MCID_62b1aedbfe64814cb17d73bd</WebEnv>
  <QueryKey>1</QueryKey>
  <Count>40356</Count>
  <Step>1</Step>
</ENTREZ_DIRECT>
2093
<ENTREZ_DIRECT>
  <Db>nuccore</Db>
  <WebEnv>MCID_62b1aedcb1afcf6af8447fe6</WebEnv>
  <QueryKey>1</QueryKey>
  <Count>119993</Count>
  <Step>1</Step>
</ENTREZ_DIRECT>
3256
<ENTREZ_DIRECT>
  <Db>nuccore</Db>
  <WebEnv>MCID_62b1aedc97bf993ae11f75af</WebEnv>
  <QueryKey>1</QueryKey>
  <Count>22818</Count>
  <Step>1</Step>
</ENTREZ_DIRECT>

To actually retrieve the sequences do something like this

$ for i in `cat id`; do echo ${i}; esearch -db nuccore -query "${i}[taxID]" | efetch -format fasta >> ${i}.fa; done
ADD COMMENT

Login before adding your answer.

Traffic: 1875 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6