What is the difference between --split-files and --split-3 when using fastq-dump?
2
3
Entering edit mode
5.3 years ago
elenajmichel ▴ 90

Hello,

I have been using --split-files when using fastq-dump, but I have seen a lot of posts saying to use --split-3. The manual page is not quite clear about the difference between the two commands (besides the number of files generated), so could someone tell me what the difference is between the two commands, and under what circumstances it may be better to use one over the other?

Thank you.

RNA-Seq fastq-dump paired end reads • 6.0k views
2
Entering edit mode

Also, please look there --> C: SRA to BAM - to see about --split-files option

1
Entering edit mode

Thank you! The fastq-dump split-3 output link helped a lot.

2
Entering edit mode
5.3 years ago

According to the manual It looks like split-files creates a file for every read. Meaning the output will have a lot of files.

--split-3 is for paired-end reads. So it will produce 2 fastq files only.

If your data is single-end you don't need to use these options.

1
Entering edit mode

--split-files actually behaves the same as --split-3 and produces 2 fastq files for each SRA, hence my confusion.

2
Entering edit mode

"--split-3 will output 1,2, or 3 files: 1 file means the data is not paired. 2 files means paired data with no low quality reads or reads shorter than 20bp. 3 files means paired data, but asymmetric quality or trimming. in the case of 3 file output, most people ignore <file>.fastq . this is a very old formatting option introduced for phase1 of 1000genomes. before there were many analysis or trimming utilities and SRA submissions always contained all reads from sequencer. back then nobody wanted to throw anything away. you might want to use --split-files instead. that will give only 2 files for paired-end data. or not bother with text output and access the data directly using sra ngs apis." (from Question: fastq-dump split-3 output )

1
Entering edit mode

Actually split-3 outputs 3 files:

two files for the paired and another one for single-tone. I just run split-files on my end and it did output 2 files only. I guess the difference is the single-tone output. I'm not sure though

1
Entering edit mode

1
Entering edit mode

Split-files option outputs two fastq files in teh case of paired-end reads: one file - for forward, another - for reverse reads.

1
Entering edit mode
5.3 years ago
GenoMax 104k

Explained in the inline help for fastq-dump.

\$ fastq-dump -h

--split-files                    Dump each read into separate file.Files
number
--split-3                        Legacy 3-file splitting for mate-pairs:
conditions are placed in files *_1.fastq and
*_2.fastq If only one biological read is
present it is placed in *.fastq Biological

1
Entering edit mode

Hi, as I stated in my question I don't think it is clearly explained because the explanation for --split-files makes it sound as if a separate file will be created for each read, which isn't correct as it just splits them into two files. Based on that, I don't really understand why you would use --split-3 vs --split-files to get the same result.

2
Entering edit mode

"fastq-dump" is used with Illumina data and traditionally "read" data in Illumina speak refers to Read 1 or Read 2 (not individual reads in the files, which if you think about logically would not make sense i.e. a separate file for each read).