Question

Converting a bunch or SRA files using fastq-dump split-files

0

Entering edit mode

7.1 years ago

Mitra • 0

I have recently downloaded a bunch of sra files. and i like to convert them to fastq paired reads. It works by doing this :

./fastq-dump --split-files /Users/medsmit/ncbi/public/sra/SRR3501908.sra

But I need a way to convert them all together.

I was trying this

for i in  `ls /Users/medsmit/ncbi/public/sra/*.sra' ; do ./fastq-dump -- split-files $f; done

But definitely doing some silly mistake as its not working. Can anyone please help me? Thank you, Suparna

SRA split-files fastq-dump ncbi • 4.8k views

ADD COMMENT • link updated 7.1 years ago by ATpoint 88k • written 7.1 years ago by Mitra • 0

0

Entering edit mode

Do you have a space between -- and split-files in the loop? I would also use ls -1 so only one file is fed to fastq-dump for each iteration of the loop.

ADD REPLY • link 7.1 years ago by GenoMax 152k

0

Entering edit mode

Thanks Genomax. Yes I do have space between -- and split-files in the loop. Also I tried with ls -l After I pass this code bellow:

 medsmit$ for i in  `ls -l /Users/medsmit/ncbi/public/sra/*.sra' ; do ./fastq-dump -- split-files $f; done

I only see

As if it entered in any interface. Not sure what wrong I am doing. Thanks, Suparna

ADD REPLY • link 7.1 years ago by Mitra • 0

0

Entering edit mode

You can't have a space between --split-files. That was also a 1 (number one) not l(L) in the ls command.

And two additional mistakes noted by @jean below.

ADD REPLY • link 7.1 years ago by GenoMax 152k

0

Entering edit mode

Alternatively, you can always check the ENA for your files, which are typically mirrored there directly as fastq, or use parallel-fastq-dump (python3) if the sra files are big (tens of Gb).

ADD REPLY • link 7.1 years ago by ATpoint 88k

score 2 · Answer 1 · 2018-05-24

2

Entering edit mode

7.1 years ago

jean.elbers ★ 1.7k

You have ' instead of `
You need $i instead of $f
You need --split-files not -- split-files

for i in `ls -1 *.sra` ; do ./fastq-dump --split-files $i; done

ADD COMMENT • link 7.1 years ago by jean.elbers ★ 1.7k

0

Entering edit mode

jean.elbers Thanks for pointing these errors to me.. part of which just got introduced when I wrote this post. Now it seems like its creating fastq files, but unfortunately also creating some strange error:

./fastq-dump : 2.9.0

2018-05-24T12:10:07 fastq-dump.2.9.0 err: item not found while constructing within virtual database module - the path '1' cannot be opened as database or table
2018-05-24T12:10:07 fastq-dump.2.9.0 err: item not found while constructing within virtual database module - the path 'medsmit' cannot be opened as database or table
2018-05-24T12:10:08 fastq-dump.2.9.0 err: item not found while constructing within virtual database module - the path 'staff' cannot be opened as database or table
2018-05-24T12:10:08 fastq-dump.2.9.0 err: item not found while constructing within virtual database module - the path '25754388' cannot be opened as database or table
2018-05-24T12:10:08 fastq-dump.2.9.0 err: item not found while constructing within virtual database module - the path '18' cannot be opened as database or table
2018-05-24T12:10:08 fastq-dump.2.9.0 err: item not found while constructing within virtual database module - the path 'May' cannot be opened as database or table
2018-05-24T12:10:08 fastq-dump.2.9.0 err: item not found while constructing within virtual database module - the path '15:16' cannot be opened as database or table
Read 134818 spots for /Users/medsmit/ncbi/public/sra/SRR3502002.sra
Written 134818 spots for /Users/medsmit/ncbi/public/sra/SRR3502002.sra

Trying to understand why are they there . Thanks,

ADD REPLY • link 7.1 years ago by Mitra • 0

1

Entering edit mode

What command are you using exactly? It seems to me that fastq-dump wants to include part of the path as individual files. Note sure. It looks like the reads were properly extracted. You can double check by seeing if the number of spots matches the SRA run browser (https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=run_browser&run=SRR3502002). From what I see, the number of spots is correct.

ADD REPLY • link 7.1 years ago by jean.elbers ★ 1.7k

0

Entering edit mode

Yes the spots looks correct. So I get the fastq results. But I am not sure what are all these messages though !

ADD REPLY • link 7.1 years ago by Mitra • 0

0

Entering edit mode

Are you still using a l (lower case L) instead of 1 (number one) in your ls command?

ADD REPLY • link 7.1 years ago by GenoMax 152k

0

Entering edit mode

yes I am using for f inls -l /Users/medsmit/ncbi/public/sra/*.sra; do ./fastq-dump --split-files $f; done as my command now.

ADD REPLY • link 7.1 years ago by Mitra • 0

1

Entering edit mode

You need to use number 1 instead of lower-case letter l.

ADD REPLY • link 7.1 years ago by GenoMax 152k

0

Entering edit mode

Thank you Genomax. Sorry I couldn't reply you yesterday as Biostars restricted my daily comments limit. This time with your suggestion it works :) can you please tell me what is the exact difference in l and 1. I can see from man page -l use a long listing format and -1 list one file per line. But what I don't understand is why -l wouldn't work. Sorry for asking all these question. Actually I am a self learner. Thank you again.

ADD REPLY • link 7.1 years ago by Mitra • 0

1

Entering edit mode

Not a problem. With the long-listing (l, character lower-case L) you are getting additional information about unix permissions/group ownership/file size etc in the listing. You don't want that to be used an an input for fastq-dump so listing just the file paths one line at a time is the way to do this with 1 (number one).

ADD REPLY • link 7.1 years ago by GenoMax 152k

0

Entering edit mode

Great ..that is really helpful. I understand now thats the reason with -l I was getting result but also additional error msg as probably fastq-dump was not dealing well with all that extra information. Thank you very much, S :)

ADD REPLY • link 7.1 years ago by Mitra • 0