Question: Search for last Nuccore Entries
0
gravatar for emmanuel.bouilhol
4.7 years ago by
France/Bordeaux/CBiB
emmanuel.bouilhol20 wrote:

Hello everyone,

I'm trying to find an elegant solution to retrieve all sequence from Nuccore (nucléotide NCBI) that have been added since a timelaps (for exemple a week).

So far i found the genome report files, that contains a list of all genomes for a certain class of ornagism: ftp://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/viruses.txt (possible to parse and see what is new...)

I found that efetch and esearch allowed to search in pubmed with some dates parameters, but date search are not allowed for nuccore...

That's all I've got.

Any good idea is welcome

Thanks for your help 

 

nucleotides eutilities ncbi • 1.3k views
ADD COMMENTlink modified 4.7 years ago by 5heikki8.9k • written 4.7 years ago by emmanuel.bouilhol20
2
gravatar for 5heikki
4.7 years ago by
5heikki8.9k
Finland
5heikki8.9k wrote:

With Entrez Direct, what has been published since October 2015.

esearch -db nuccore -query "("2015/10/01"[Publication Date] : "2015/11/09"[Publication Date])"
ADD COMMENTlink modified 7 months ago by RamRS27k • written 4.7 years ago by 5heikki8.9k

Well done, piped with efetch it's perfect:

esearch -db nuccore -query "("2015/11/08"[Publication Date] : "2015/11/09"[Publication Date])" | efetch -format fasta

Many Thanks!

ADD REPLYlink modified 7 months ago by RamRS27k • written 4.7 years ago by emmanuel.bouilhol20
1

Unfortunately far from perfect. Efetch quite often fails with larger downloads and doesn't necessarily even spit out a warning or anything. I would download the GIs instead of fasta and then to begin with check that the number of downloaded GIs is the same than:

esearch -db nuccore -query "("2015/10/01"[Publication Date] : "2015/11/09"[Publication Date])" | xtract -element Count

Then I'd split the list of GIs with split to e.g. 500 lines per file and then loop over those..

for f in *.splitFile
do
    IDs=$(cat $f | tr "\n" "," | sed 's/,$//')
    epost -db nuccore -id $IDs | efetch -format fasta > $f.fna
done

In addition you need to build some kind of check for these batch downloads. E.g. the file should have as many headers as there were lines in the id file. All is great then as long as download didn't fail in the middle of the last sequence :)

ADD REPLYlink modified 7 months ago by RamRS27k • written 4.7 years ago by 5heikki8.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1268 users visited in the last hour