Why There Are 3 Fastq File In This Pair-End Data?
Also on NCBI:
I downloaded them and the first 4 lines looks like the following:
SRR346373$ head -4 S*fastq
==> SRR346373_1.fastq <==
==> SRR346373_2.fastq <==
==> SRR346373.fastq <==
It seems obvious that
1 fastq are within a pair-end data. But what does SRR346373.fastq stands for? It is much smaller than the other two fastq file(1/20 lines of them). Anyone knows what does it means?
I'd guess it is a file of the remaining unpaired reads.
The _1 and _2 files should have the same sequence IDs in the same order. The third file contains reads for which paired sequence was not generated and may contain reads labeled either /1 or /2.
Structuring the data this way saves having to do the uneven traversal of the two files, you can always assume that the 200th read in the _1 file corresponds to the 200th read in the _2 file.
Being AB_SOLiD data, the _1 file is the Forward [F3] read (T prefix), the _2 file is the Reverse [R3] read (G prefix).
The third file is the barcode, the other two are the paired end reads.
Traffic: 1343 users visited in the last hour