fastq file question
2
1
Entering edit mode
9.7 years ago
bitjunkie ▴ 40

Are fastq sequence identifiers unique within a fastq file \ between paired-end fastq files? Does each read get its own unique identifier? I assume this is the case since each read corresponds to one discrete spot on the flow cell. I just want to make sure.... I'm not so sure they are unique between paired-end fastq files, however.

If true, this means that two reads with the same nucleotide sequence should have different identifiers.

Cheers

sequencing next-gen • 3.6k views
ADD COMMENT
1
Entering edit mode
9.7 years ago

Generally the identifiers are unique, but they don't have to be.

ADD COMMENT
1
Entering edit mode
9.7 years ago

In general, each read (entry) in fastq file should have unique identifier and this is also what you get when getting output directly from sequencing machines. When reads are paired, there are 2 files (F and R) with corresponding entries - first read from F file comes from the same fragment of DNA/RNA as first read from R file. Therefore, they have same names _or_ the lines end with /1 and /2 for F and R reads, respectively (or something similar to this). Good sanity check is to always count number of reads in both files, they should match (at least before trimming and quality check).

However, you can have two different reads (hence different identifiers) and same nucleotide sequence, for example when you have small genome size and sequence on very high coverage - you get redundancy just because you oversampled.

ADD COMMENT

Login before adding your answer.

Traffic: 2369 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6