fastq-dump split-3 output
2
4
Entering edit mode
7.3 years ago
colonppg ▴ 110

If I run fastq-dump --split-3 on a sra file, I get

file_1.fastq
file_2.fastq
file.fastq


My questions is how I handle file.fastq? Should I just ignore it?

fastq-dump sra • 17k views
0
Entering edit mode

Thanks so much! guys!

10
Entering edit mode
7.3 years ago

--split-3 will output 1,2, or 3 files: 1 file means the data is not paired. 2 files means paired data with no low quality reads or reads shorter than 20bp. 3 files means paired data, but asymmetric quality or trimming. In the case of 3 file output, most people ignore <file>.fastq. This is a very old formatting option introduced for phase1 of 1000genomes. Before there were many analysis or trimming utilities and SRA submissions always contained all reads from sequencer. Back then nobody wanted to throw anything away. You might want to use --split-files instead. That will give only 2 files for paired-end data. Or not bother with text output and access the data directly using sra ngs apis.

0
Entering edit mode

On some occasions with --split-files, you have to use -M 0 or else end up with unpaired reads, due to fastq-dump discarding small reads but keeping its pair.

2
Entering edit mode
7.3 years ago

Generally file.fastq is much smaller than the other two and the result of orphaned reads after trimming. I would typically ignore that file unless you really need the extra reads. The exception is if file.fastq is larger than either of the other two fastq files. In that case, you probably want to ignore the paired-end reads (or just use them to get things like the insert size distribution).