Question: Converting a bunch or SRA files using fastq-dump split-files
0
gravatar for Mitra
10 months ago by
Mitra0
Mitra0 wrote:

I have recently downloaded a bunch of sra files. and i like to convert them to fastq paired reads. It works by doing this :

./fastq-dump --split-files /Users/medsmit/ncbi/public/sra/SRR3501908.sra

But I need a way to convert them all together.

I was trying this

for i in  `ls /Users/medsmit/ncbi/public/sra/*.sra' ; do ./fastq-dump -- split-files $f; done

But definitely doing some silly mistake as its not working. Can anyone please help me? Thank you, Suparna

fastq-dump split-files sra ncbi • 625 views
ADD COMMENTlink modified 10 months ago by ATpoint14k • written 10 months ago by Mitra0

Do you have a space between -- and split-files in the loop? I would also use ls -1 so only one file is fed to fastq-dump for each iteration of the loop.

ADD REPLYlink written 10 months ago by genomax64k

Thanks Genomax. Yes I do have space between -- and split-files in the loop. Also I tried with ls -l After I pass this code bellow:

 medsmit$ for i in  `ls -l /Users/medsmit/ncbi/public/sra/*.sra' ; do ./fastq-dump -- split-files $f; done

I only see

>

As if it entered in any interface. Not sure what wrong I am doing. Thanks, Suparna

ADD REPLYlink written 10 months ago by Mitra0

You can't have a space between --split-files. That was also a 1 (number one) not l(L) in the ls command.

And two additional mistakes noted by @jean below.

ADD REPLYlink modified 10 months ago • written 10 months ago by genomax64k

Alternatively, you can always check the ENA for your files, which are typically mirrored there directly as fastq, or use parallel-fastq-dump (python3) if the sra files are big (tens of Gb).

ADD REPLYlink written 10 months ago by ATpoint14k
2
gravatar for jean.elbers
10 months ago by
jean.elbers740
jean.elbers740 wrote:
  1. You have ' instead of `
  2. You need $i instead of $f
  3. You need --split-files not -- split-files

    for i in `ls -1 *.sra` ; do ./fastq-dump --split-files $i; done

ADD COMMENTlink modified 10 months ago • written 10 months ago by jean.elbers740

jean.elbers Thanks for pointing these errors to me.. part of which just got introduced when I wrote this post. Now it seems like its creating fastq files, but unfortunately also creating some strange error:

./fastq-dump : 2.9.0

2018-05-24T12:10:07 fastq-dump.2.9.0 err: item not found while constructing within virtual database module - the path '1' cannot be opened as database or table
2018-05-24T12:10:07 fastq-dump.2.9.0 err: item not found while constructing within virtual database module - the path 'medsmit' cannot be opened as database or table
2018-05-24T12:10:08 fastq-dump.2.9.0 err: item not found while constructing within virtual database module - the path 'staff' cannot be opened as database or table
2018-05-24T12:10:08 fastq-dump.2.9.0 err: item not found while constructing within virtual database module - the path '25754388' cannot be opened as database or table
2018-05-24T12:10:08 fastq-dump.2.9.0 err: item not found while constructing within virtual database module - the path '18' cannot be opened as database or table
2018-05-24T12:10:08 fastq-dump.2.9.0 err: item not found while constructing within virtual database module - the path 'May' cannot be opened as database or table
2018-05-24T12:10:08 fastq-dump.2.9.0 err: item not found while constructing within virtual database module - the path '15:16' cannot be opened as database or table
Read 134818 spots for /Users/medsmit/ncbi/public/sra/SRR3502002.sra
Written 134818 spots for /Users/medsmit/ncbi/public/sra/SRR3502002.sra

Trying to understand why are they there . Thanks,

ADD REPLYlink written 10 months ago by Mitra0
1

What command are you using exactly? It seems to me that fastq-dump wants to include part of the path as individual files. Note sure. It looks like the reads were properly extracted. You can double check by seeing if the number of spots matches the SRA run browser (https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=run_browser&run=SRR3502002). From what I see, the number of spots is correct.

ADD REPLYlink modified 10 months ago • written 10 months ago by jean.elbers740

Yes the spots looks correct. So I get the fastq results. But I am not sure what are all these messages though !

ADD REPLYlink written 10 months ago by Mitra0

Are you still using a l (lower case L) instead of 1 (number one) in your ls command?

ADD REPLYlink modified 10 months ago • written 10 months ago by genomax64k

yes I am using for f inls -l /Users/medsmit/ncbi/public/sra/*.sra; do ./fastq-dump --split-files $f; done as my command now.

ADD REPLYlink modified 10 months ago • written 10 months ago by Mitra0
1

You need to use number 1 instead of lower-case letter l.

ADD REPLYlink modified 10 months ago • written 10 months ago by genomax64k

Thank you Genomax. Sorry I couldn't reply you yesterday as Biostars restricted my daily comments limit. This time with your suggestion it works :) can you please tell me what is the exact difference in l and 1. I can see from man page -l use a long listing format and -1 list one file per line. But what I don't understand is why -l wouldn't work. Sorry for asking all these question. Actually I am a self learner. Thank you again.

ADD REPLYlink written 10 months ago by Mitra0
1

Not a problem. With the long-listing (l, character lower-case L) you are getting additional information about unix permissions/group ownership/file size etc in the listing. You don't want that to be used an an input for fastq-dump so listing just the file paths one line at a time is the way to do this with 1 (number one).

ADD REPLYlink modified 10 months ago • written 10 months ago by genomax64k

Great ..that is really helpful. I understand now thats the reason with -l I was getting result but also additional error msg as probably fastq-dump was not dealing well with all that extra information. Thank you very much, S :)

ADD REPLYlink written 10 months ago by Mitra0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2140 users visited in the last hour