Question

fastq file question

1

Entering edit mode

9.7 years ago

bitjunkie ▴ 40

Are fastq sequence identifiers unique within a fastq file \ between paired-end fastq files? Does each read get its own unique identifier? I assume this is the case since each read corresponds to one discrete spot on the flow cell. I just want to make sure.... I'm not so sure they are unique between paired-end fastq files, however.

If true, this means that two reads with the same nucleotide sequence should have different identifiers.

Cheers

sequencing next-gen • 3.6k views

ADD COMMENT • link updated 2.4 years ago by Ram 43k • written 9.7 years ago by bitjunkie ▴ 40

Ram · Answer 1 · 2014-09-10

1

Entering edit mode

9.7 years ago

Devon Ryan 104k

Generally the identifiers are unique, but they don't have to be.

ADD COMMENT • link updated 2.4 years ago by Ram 43k • written 9.7 years ago by Devon Ryan 104k

score 1 · Answer 2 · 2014-09-10

In general, each read (entry) in fastq file should have unique identifier and this is also what you get when getting output directly from sequencing machines. When reads are paired, there are 2 files (F and R) with corresponding entries - first read from F file comes from the same fragment of DNA/RNA as first read from R file. Therefore, they have same names _or_ the lines end with /1 and /2 for F and R reads, respectively (or something similar to this). Good sanity check is to always count number of reads in both files, they should match (at least before trimming and quality check).

However, you can have two different reads (hence different identifiers) and same nucleotide sequence, for example when you have small genome size and sequence on very high coverage - you get redundancy just because you oversampled.