Question: How to download raw data in batch from NCBI based on Series Accession number or Platform ID
0
gravatar for biolab
3 months ago by
biolab920
China
biolab920 wrote:

Dear all,

I have a list of NCBI GEO Series Accession numbers and Platform IDs, and want to download the raw data in batch. A previous post on Biostars presents a good example of batch download (How to download raw sequence data from GEO/SRA ), but that solution is based on project ID rather than GEO Series Accession number. Does anyone know how to work out this task? Thank you very much!

sra • 238 views
ADD COMMENTlink modified 6 weeks ago by Istvan Albert ♦♦ 73k • written 3 months ago by biolab920

Can you post an example of the Accession number you are interested in? @Istvan's solution with eUtils should be able to accommodate your needs.

ADD REPLYlink written 3 months ago by genomax32k

Thank you for your comment, genomax! The GEO Series Accession Number is something like GSE65022, and the Platform ID is like GPL19657. I want to get the SRA number something like SRR4024915.

ADD REPLYlink modified 3 months ago • written 3 months ago by biolab920

This may be helpul batchentrez

ADD REPLYlink written 3 months ago by Buffo470

Hi, Buffo, thanks for your comment! However, after uploading a list of Platform ID (eg, GPL19657), I could not get the SAR run number, which is something like SRR4024915.

ADD REPLYlink written 3 months ago by biolab920

Hi, Just in case you are only interested in SRR ids, SRA run selector is a very good option. You can either enter GSE65022 in the run selector and it should pull all the relevant metadata for you. For example is this url https://www.ncbi.nlm.nih.gov/Traces/study/?acc=GSE65022&go=go

ADD REPLYlink written 6 weeks ago by microfuge710
1
gravatar for Istvan Albert
6 weeks ago by
Istvan Albert ♦♦ 73k
University Park, USA
Istvan Albert ♦♦ 73k wrote:

You can connect GEO to the SRA run info like so:

esearch -query GSE65022 -db gds | elink -target sra | efetch -format runinfo

then from that you can build the command to automate data download as such (this only gets the first 10 spots to allow easy testing):

esearch -query GSE65022 -db gds | elink -target sra | efetch -format runinfo | cut -d ',' -f 1 | grep SRR | xargs fastq-dump -X 10 --split-files

remove the limit of -X 10 when getting all the data.

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by Istvan Albert ♦♦ 73k

Thank you very much, Istvan. The command you provided is really helpful!

ADD REPLYlink written 6 weeks ago by biolab920
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1262 users visited in the last hour