fastq-dump split-3 output
2
4
Entering edit mode
7.3 years ago
colonppg ▴ 110

If I run fastq-dump --split-3 on a sra file, I get

file_1.fastq
file_2.fastq
file.fastq

My questions is how I handle file.fastq? Should I just ignore it?

fastq-dump sra • 17k views
ADD COMMENT
0
Entering edit mode

Thanks so much! guys!

ADD REPLY
10
Entering edit mode
7.3 years ago

--split-3 will output 1,2, or 3 files: 1 file means the data is not paired. 2 files means paired data with no low quality reads or reads shorter than 20bp. 3 files means paired data, but asymmetric quality or trimming. In the case of 3 file output, most people ignore <file>.fastq. This is a very old formatting option introduced for phase1 of 1000genomes. Before there were many analysis or trimming utilities and SRA submissions always contained all reads from sequencer. Back then nobody wanted to throw anything away. You might want to use --split-files instead. That will give only 2 files for paired-end data. Or not bother with text output and access the data directly using sra ngs apis.

ADD COMMENT
0
Entering edit mode

On some occasions with --split-files, you have to use -M 0 or else end up with unpaired reads, due to fastq-dump discarding small reads but keeping its pair.

ADD REPLY
2
Entering edit mode
7.3 years ago

Generally file.fastq is much smaller than the other two and the result of orphaned reads after trimming. I would typically ignore that file unless you really need the extra reads. The exception is if file.fastq is larger than either of the other two fastq files. In that case, you probably want to ignore the paired-end reads (or just use them to get things like the insert size distribution).

ADD COMMENT

Login before adding your answer.

Traffic: 1378 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6