11 months ago
j_eag ▴ 10

Hi (beginner here so go easy on me). I'm practicing different ways of downloading. I have various questions despite doing a lot of googling on the matter.

1)

I'm trying to run something like this (I know these aren't the exact commands for prefetch):

prefetch $(<SRAacclist.txt) --gzip --outdir /scratch/eg5/trial2/sns/fqdata  When I run prefetch$(<SRAacclist.txt) my files do get downloaded of course, but they're not zipped, or in the folder I want them to be. Additionally it downloads extra sra folders, when all I want is the fastq file. How can I specify this?

2) All my modules are loaded ( edirect, sra etc) yet I keep getting a not found error for " --format"

esearch -db sra -query PRJNA386935 | efetch -format runinfo | cut -d "," -f 1 > SRR.numbers


Any ideas?

3) For downloading from SRA to hpc cluster folder: prefetch vs parallel vs wget vs fastqdump . What do you guys think? So far prefetch jas been the fastest, but fastq dump seems to be most easily 'customizable' .

sra bioinformatics rna-seq fastq sequencing • 1.4k views
prefetch has no gzip option afaik, and it makes no sense because the sra format is already binary and comressed. There is also no --outdir but :

-o|--output-file <FILE>          write file to FILE when downloading
single file
-O|--output-directory <DIRECTORY>  save files to DIRECTORY/


Type prefetch -h and read the help.

yeah, sorry I should have clarified. I already checked the --help section and tested out different commands, my post was just to demonstrate what Im trying to do. I've tried using " --type" and choosing *.fastq.gz, but none of it worked

prefetch does not return fastq, it returns sra files which require conversion to fastq with fastq-dump.

For me (when I did prefetch \$(<acclist.txt) ) , it downloaded fastq files and (.fastq) and folders for each that contained .sra files

Ah, after years and years that users complained about that missing feature they seem to have recently added that functionality to get fastq directly. Hah, only 10 years too late, but hey why not :) Now gzip is missing, yeah, that is sra-tools, a collection of mess, that is simply how it is /shrug.

what version of prefetch do you have? mine does not download fastq, and more so we usually need to specify how to unpack fastq, does it unpack the files?

2.11 seems to offer that now so quite a recent addition

I ran the new prefetch, and did not get a FASTQ file:

prefetch SRR14575325


the tool does indeed work differently, it creates a subdirectory for the SRA file rather than putting under ~/ncbi/public/sra but I don't get FASTQ files there

add --type fastq

this is so typical of all sra tools in general

prefetch  SRR1972739


works fine, downloads the SRA file but right after it if I do:

prefetch --type fastq  SRR1972739


prints:

2021-10-14T17:57:47 prefetch.2.11.2 err: name not found while resolving query within virtual file system module - failed to resolve accession 'SRR1972739' - no data ( 404 )

"prefetch" version 2.10.9

0
Just leaving this here fyi: sra-explorer : find SRA and FastQ download URLs in a couple of clicks

You do not need sra-tools to get data, there are (better) alternatives.

Alas the SRA has introduced changes that broke the Explorer. Only the links to EBI work. For example, this is what the explorer shows:

ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR197/SRR1972739/SRR1972739.sra


the file is not there anymore, you need a different method to find it.

Yes, this is known. They moved the SRA files to the cloud and Phil Ewels has not yet made the changes to the explorer, but there are issues pointing this out already.

This post prompted me to investigate the methods so that I know how to advise people.

I wrote up the results here

What is the best way to obtain FASTQ reads from the Short Read Archive (SRA)

11 months ago

efetch takes parameters with single minus

-format


now when it comes to speed, the commands work in mysterious ways, the reasons for the speed differences are not properly explained

Sorry thats what the online source told me to use! Thanks, works now.

the command looks very much like what I advocate in the Biostar Handbook, as it turns out entrez direct has been updated and it used to take both types of parameters, looks like they only take the short form. so I have to update the book,

Edit: the book has been corrected