Question: Sra File Fastq-Dump Command
1
gravatar for Varun Gupta
7.1 years ago by
Varun Gupta1.1k
United States
Varun Gupta1.1k wrote:

Hi Everyone I am working on Sra files. I need to convert them into Fastq files. When i used fastq-dump on my SRR786.sra file , it gave me three files namely SRR7861.fastq, SRR7862.fastq and SRR786.fastq. I read the documentation for SRA and in that it is written that if we use fastq-dump --split-3 SRR786.sra, then we get 3 files as i got above. Additionally SRR786.fastq file is much smaller in size than other 2 files namely SRR7861.fastq and SRR7862.fastq.

I would be really kind if you guys explain me why 3 files were generated when i did not gave the split option and also what does this small fastq file generatations means

Regards Varun

sra • 15k views
ADD COMMENTlink written 7.1 years ago by Varun Gupta1.1k

perhaps _1 & _2 are paired end and the other is a single end library.

ADD REPLYlink written 7.1 years ago by Zev.Kronenberg11k

But the size of the single end library as u say is way too small as compared to _1 and _2. Any suggestions on this .

Regards Varun

ADD REPLYlink written 7.1 years ago by Varun Gupta1.1k
2
gravatar for Nico
7.1 years ago by
Nico190
NYC
Nico190 wrote:

It's documented here: http://www.ncbi.nlm.nih.gov/books/NBK47540/#SRA_Download_Guid_B.5_Converting_SRA_for

section 5.5 Basic Execution of ‘fastq-dump’ Utility

"The file, ‘SRR000001.fastq’, contains fragment read sequences where only a single biological/application read exists (or remained after filtering) for a spot. The first spot in the file will look like this"...

It's not very clear to me what the "small" file is, but I guess that what you want to use are the 2 "large" ones.

ADD COMMENTlink written 7.1 years ago by Nico190

Hi Nico Thanks for your reply. I read that but could not understand why 3rd file(.fastq) is even generated when i simply used the command fastq-dump and not fastq-dump --split -3.With small file i meant that the size of SRR76.fastq was too small as compared to other 2 files.

ADD REPLYlink written 7.1 years ago by Varun Gupta1.1k

The documentation as linked indicates that the smaller file 3 the better. IT are dumped reads: "--split-3

Legacy 3-file splitting for mate-pairs used for the 1000 Genomes fastq files. First 2 biological reads satisfying dumping conditions are placed in files _1.fastq and _2.fastq. If only 1 biological read is dumpable - it is placed in *.fastq Biological reads 3 and above are ignored."

ADD REPLYlink written 7.0 years ago by ALchEmiXt1.9k
2
gravatar for Madelaine Gogol
7.1 years ago by
Madelaine Gogol5.0k
Kansas City
Madelaine Gogol5.0k wrote:

Maybe you could try -v -v -v -v

‘-v’ or ‘--verbose’ Increases the verbosity level of the program. Use multiple times for more verbosity.

Ha ha.

Really, I don't know why you are seeing what you do. Could you try getting the sra.lite version of the file instead, just to see if it's doing the same thing?

I don't know if this helps at all, but maybe check your version of the sra toolkit and contact the help desk over there if you keep having problems.

http://biostar.stackexchange.com/questions/11134/how-to-convert-sra-lite-paired-end-submission-to-fastq

ADD COMMENTlink modified 7.1 years ago • written 7.1 years ago by Madelaine Gogol5.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1580 users visited in the last hour