How to find newly submitted accessions in NCBI
2
1
Entering edit mode
9 weeks ago
LDT ▴ 220

Dear all,

I want to automate a process to identify newly submitted plant accessions in NCBI. I am scanning the NCBI FTP server, but I have not yet found any address to locate all SRA accessions.

https://ftp.ncbi.nlm.nih.gov/

Does anybody have an idea where I could find this list?

ncbi • 494 views
ADD COMMENT
2
Entering edit mode
9 weeks ago

If you are looking for SRA accession numbers you should search the SRA database

or from the command line, I gave it a go:

esearch -db sra -query '"2022/11/28"[Publication Date]' | efetch -format runinfo > 2022-11-18.csv

how may lines?

cat 2022-11-18.csv | wc -l

prints:

4638

looks like today Nov 11, 2022 there were 4638 datasets deposited at SRA ... whoa, I did not expect that ... I am extraordinarily surprised to be honest. That is a lot of data.

What is the size of all that data?

 cat 2022-11-18.csv | csvcut -c size_MB | grep -v size | datamash sum 1

prints:

 2471916

which ends up about 2.4 terrabytes.

ADD COMMENT
0
Entering edit mode

This is extremely cool, Istvan and I want to thank you for being so helpful to us. One question? Is there a way that I can focus the search only on plants, animals or bacteria?

ADD REPLY
1
Entering edit mode

Technically there is a field for TaxID in the output that runinfo option in the command above but it is sadly not populated for many entries (certainly not for new ones). I checked on that yesterday. You can add a TaxID number to the query in the first part of the command.

ADD REPLY
0
Entering edit mode

thank you so much GenoMax :)

ADD REPLY
1
Entering edit mode
9 weeks ago
GenoMax 125k

NCBI publishes a file containing SRA accession numbers. It is updated daily (file is almost a gigabyte so a largeish download). It appears to have accession numbers that start a ways back and are current up to a given date.

$ head NCBI_SRA_Datalist 
Submission  Run Date
DRA000001   DRR000001   2014-05-26T10:22:28Z
DRA000002   DRR000002   2014-05-26T11:00:19Z
DRA000003   DRR000003   2014-05-26T11:07:49Z
DRA000003   DRR000004   2014-05-26T11:07:46Z

$ tail NCBI_SRA_Datalist 

SRA1548151  SRR22428598 2022-11-28T18:25:46Z
SRA1548154  SRR22428656 2022-11-28T18:34:47Z
SRA1548154  SRR22428657 2022-11-28T18:33:44Z
SRA1548154  SRR22428658 2022-11-28T18:33:31Z
ADD COMMENT
0
Entering edit mode

This is so cool! I was wondering how I can find the new plant species from there, for example. Do you have an idea? Thank you so much for your time and suggestions

ADD REPLY

Login before adding your answer.

Traffic: 1269 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6