1
0
Entering edit mode
11 months ago
j_eag ▴ 10

Hi (beginner here so go easy on me). I'm practicing different ways of downloading. I have various questions despite doing a lot of googling on the matter.

1)

I'm trying to run something like this (I know these aren't the exact commands for prefetch):

prefetch $(<SRAacclist.txt) --gzip --outdir /scratch/eg5/trial2/sns/fqdata  When I run prefetch$(<SRAacclist.txt) my files do get downloaded of course, but they're not zipped, or in the folder I want them to be. Additionally it downloads extra sra folders, when all I want is the fastq file. How can I specify this?

2) All my modules are loaded ( edirect, sra etc) yet I keep getting a not found error for " --format"

esearch -db sra -query PRJNA386935 | efetch -format runinfo | cut -d "," -f 1 > SRR.numbers


Any ideas?

3) For downloading from SRA to hpc cluster folder: prefetch vs parallel vs wget vs fastqdump . What do you guys think? So far prefetch jas been the fastest, but fastq dump seems to be most easily 'customizable' .

sra bioinformatics rna-seq fastq sequencing • 1.4k views
0
Entering edit mode

prefetch has no gzip option afaik, and it makes no sense because the sra format is already binary and comressed. There is also no --outdir but :

-o|--output-file <FILE>          write file to FILE when downloading
single file
-O|--output-directory <DIRECTORY>  save files to DIRECTORY/


Type prefetch -h and read the help.

0
Entering edit mode

yeah, sorry I should have clarified. I already checked the --help section and tested out different commands, my post was just to demonstrate what Im trying to do. I've tried using " --type" and choosing *.fastq.gz, but none of it worked

0
Entering edit mode

prefetch does not return fastq, it returns sra files which require conversion to fastq with fastq-dump.

0
Entering edit mode

For me (when I did prefetch \$(<acclist.txt) ) , it downloaded fastq files and (.fastq) and folders for each that contained .sra files

0
Entering edit mode

Ah, after years and years that users complained about that missing feature they seem to have recently added that functionality to get fastq directly. Hah, only 10 years too late, but hey why not :) Now gzip is missing, yeah, that is sra-tools, a collection of mess, that is simply how it is /shrug.

0
Entering edit mode

what version of prefetch do you have? mine does not download fastq, and more so we usually need to specify how to unpack fastq, does it unpack the files?

0
Entering edit mode

2.11 seems to offer that now so quite a recent addition

0
Entering edit mode

I ran the new prefetch, and did not get a FASTQ file:

prefetch SRR14575325


the tool does indeed work differently, it creates a subdirectory for the SRA file rather than putting under ~/ncbi/public/sra but I don't get FASTQ files there

0
Entering edit mode

add --type fastq

0
Entering edit mode

this is so typical of all sra tools in general

prefetch  SRR1972739


works fine, downloads the SRA file but right after it if I do:

prefetch --type fastq  SRR1972739


prints:

2021-10-14T17:57:47 prefetch.2.11.2 err: name not found while resolving query within virtual file system module - failed to resolve accession 'SRR1972739' - no data ( 404 )

0
Entering edit mode

"prefetch" version 2.10.9

0
Entering edit mode

Just leaving this here fyi: sra-explorer : find SRA and FastQ download URLs in a couple of clicks

You do not need sra-tools to get data, there are (better) alternatives.

0
Entering edit mode

Alas the SRA has introduced changes that broke the Explorer. Only the links to EBI work. For example, this is what the explorer shows:

ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR197/SRR1972739/SRR1972739.sra


the file is not there anymore, you need a different method to find it.

0
Entering edit mode

Yes, this is known. They moved the SRA files to the cloud and Phil Ewels has not yet made the changes to the explorer, but there are issues pointing this out already.

0
Entering edit mode

This post prompted me to investigate the methods so that I know how to advise people.

I wrote up the results here

What is the best way to obtain FASTQ reads from the Short Read Archive (SRA)

1
Entering edit mode
11 months ago

efetch takes parameters with single minus

-format


now when it comes to speed, the commands work in mysterious ways, the reasons for the speed differences are not properly explained

0
Entering edit mode

Sorry thats what the online source told me to use! Thanks, works now.

0
Entering edit mode

the command looks very much like what I advocate in the Biostar Handbook, as it turns out entrez direct has been updated and it used to take both types of parameters, looks like they only take the short form. so I have to update the book,

Edit: the book has been corrected