How to automate the conversion of a large number of .sra files to .fastq?
1
0
Entering edit mode
5.7 years ago
Jeff M • 0

Hello,

Sorry if this is a rather basic question - but I'm completely new the the field of bioinformatics and have essentially no coding experience. I've been able to find some other similar questions asked previously, but the solutions provided don't seem to work (possibly because of SRA toolkit updates?) or are written to be run in a Unix environment (Bash? I think), while I'm trying to work in Windows. I would appreciate any advice anyone could provide on my issue.

I'm trying to download a rather large RNAseq dataset (GSE62772) for reanalysis - such that I want to download the fastq files, align them via kallisto, and analyze for differential expression. I know how to download/convert individual runs using fastq-dump, but I can't quite figure out how to run this for a large number of samples in an automated manner. A previous answer:

fastq-dump --split-3 --gzip $(<SraAccList.txt)

doesn't seem to work for me - giving an error that it wasn't able to recognize the input. I was able to use the accession list to download .sra files using:

prefetch --option-file SraAccList.txt

However, at this point I have no idea how to convert these to fastq besides individually. I've seen some answers on here e.g.

cat SRR_list.txt | xargs -n 1 bash get_SRR_data.sh

but from what I can tell this is meant to be run in a unix environment - whereas before this I've been running everything through the windows command prompt. Is there any way to similarly run this process in a Windows environment? I've also seen other resources suggesting downloading and converting the files through R, but I think I'd end up at the same issue where I would need to run kallisto manually on 166 files, which doesn't seem reasonable. Given that the accession numbers are all sequential it should be possible to run a for loop - but I'm not familiar enough with any language (having only worked in MATLAB before) to know what the best way of doing this is.

Does anyone have any suggestions on the best (and least involved) way of doing converting and analyzing a large number of files? This isn't something I intend to be doing regularly, so I've been trying to find quick solutions (that doesn't involve learning a new language or environment), but I'm starting to think that might not be possible. Any help would be greatly appreciated!

RNA-Seq • 1.5k views
ADD COMMENT
0
Entering edit mode

If you're going to do any real bioinformatics on Windows, you need to acquaint yourself with the Window Subsystem for Linux. It will make your life much easier.

ADD REPLY
3
Entering edit mode
5.7 years ago

The FASTQ files are available direct from here: https://www.ebi.ac.uk/ena/data/view/PRJNA265099

I found it by searching, at ENA, for the BioProject ID listed on the GEO accession record.

Kevin

ADD COMMENT
1
Entering edit mode

Bulk download Java tool button located on the page @Kevin linked can be used to download the files in bulk.

Another option would be to use sra-explorer from Phil Ewels to get download links for all files in bulk: sra-explorer : find SRA and FastQ download URLs in a couple of clicks Search for PRJNA265099.

ADD REPLY

Login before adding your answer.

Traffic: 5898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6