understanding the output files of fasterq-dump --split-files
1
0
Entering edit mode
5.7 years ago

I am using fasterq-dump to download from sra, and using split-files to split paired end reads. as a result I receive one or two files. when I have two files they are in the format *_1.fastq, and another file *_2.fastq or *_3.fastq or *_4.fastq I cannot find what is the meaning of these numbers?

the command I am using:

fasterq-dump --split-files -O /media/lab/fastq ERR016705

for example:

ERR016705 has 2 files: _1, _4 ERR015587 has 2 files: _1, _2

fasterq-dump • 8.9k views
ADD COMMENT
0
Entering edit mode

I am also confused with this. On the HowTo page, they say you could get 1 and 2.fastq files for paired reads, and a 3.fastq for unmated. But on item 8, they list 1 and 2 and a simple .fastq. Is this simple .fastq also for unmated reads? Or is it different from the 3.fastq? After reading this post, I'm not sure if the .fastq file contains the unmated reads, or low quality reads and should be ignored.

ADD REPLY
1
Entering edit mode

That option was likely used with older data if you are looking at something recent then chances of getting a third file should be small unless submitters have supplied data from an index read as a separate file. In case of single cell data 10x cellranger software produces a separate file index reads when used for demultiplexing.

ADD REPLY
0
Entering edit mode

Anyone reaching this post by search in future ERR016705 now shows just two fastq files at ENA.

ADD REPLY
0
Entering edit mode
5.7 years ago
GenoMax 141k

Have you looked at the headers of the fastq files. Even though the files themselves are named 1 and 4 the headers should tell you that these are R1 and R2 files.
(Note: Illumina sequencing happens in Read 1 --> Index 1 --> Index 2 --> Read 2 order. Sometimes people may dump index sequences into individual files and in that case output files have File 1 --> File 2 --> File 3 --> File 4 names.)

ADD COMMENT

Login before adding your answer.

Traffic: 2328 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6