Best way to find all downloadable publicly available raw fastq files from SRA/GEO for a specific cell type (not having a GEO accession number)
1
1
Entering edit mode
4.2 years ago
1888 ▴ 70

Hi,

What are people using to search for raw fastq files in GEO or SRA made available from past publications for a specific cell? I am currently using SRA search database with key words, specifying in the search term:

GM12878[All Fields] AND cluster_public[prop]

However this misses some downloadable data.

There is a nice post here (How to download raw sequence data from GEO/SRA) to retrieve these files using fastq-dump, but this assumes the GEO accession number, or the SRA project page, is already known.

Is there another searchable engine that helps to search for these more efficiently that people are currently using? I see there is an R package called GEOquery to help with GEO searches - I haven't looked at it yet but if it is possible to limit by only certain files and cell types, this may be the best option.

Thanks much!

search-tools • 2.2k views
ADD COMMENT
0
Entering edit mode

Are you able to use the SRA search and send the results to SRA run selector? Then you can collect a list of SRA runs (start with SRR.ERR.DRR) which you can pass to prefetch.

I have used SRAdb and SRAdbV2 in the past, but these packages are no longer maintained

ADD REPLY
0
Entering edit mode

Thanks! The sra-explorer is great. What about GEO? It looks like some datasets available only on GEO and not SRA. Adding the search "filetype fastq"[Properties]) AND cluster_public[prop] into the GEO DataSets or GEO Profiles search boxes in NCBI doesn't do the job for filtering for publicly available fastq files.

ADD REPLY
0
Entering edit mode

It looks like some datasets available only on GEO and not SRA

Can you provide an example?

ADD REPLY
0
Entering edit mode

Hi, for example GSE96107... unless I am confused about the conversion of this GEO accession to SRA...

ADD REPLY
1
Entering edit mode

That just looks like the top level series accession for multiple samples. If you look at the individual 90+ samples each of those has a SRA accession.

ADD REPLY
4
Entering edit mode
4.2 years ago
GenoMax 141k

You can try sra-explorer ( https://sra-explorer.info/ ) from Phil Ewels.

ADD COMMENT
1
Entering edit mode

Agreed, sra-explorer is the most convenient way. It will also offer you download links for download of either sra or fastq files from NCBI or the European Nucleotide Archive (ENA). For very fast download it provides Aspera download links. For help on setting up Aspera (or on how to download fastq files independent from sra-explorer directly from ENA), see Fast download of FASTQ files from the European Nucleotide Archive (ENA)

ADD REPLY

Login before adding your answer.

Traffic: 2100 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6