"Valid" names for paired-end reads
1
1
Entering edit mode
9.5 years ago
Rob 6.5k

So, I realize the answer to this question might be "there is no standard, anything can happen." However, I'm curious what are the valid ways in which paired-end reads can be named. For example, I know it's possible for both mates of a paired-end read to have exactly the same name. Also, sometimes they are named as X\1 and X\2 where X is an identical prefix shared by both reads. Also, sometimes we get the lovely X_1 and X_2. What other variants are possible? Are there any restrictions on exactly what prefix must be shared and how the two reads in a pair must be named?

next-gen-sequencing RNA-Seq • 3.3k views
ADD COMMENT
1
Entering edit mode
9.5 years ago
SES 8.6k

The most common variants you will see come from the lllumina identifiers, which is explained on the FASTQ format wikipedia page. I have also seen people use "a" and "b" to denote forward and reverse, or simply leave off the identifier. Things get complicated when people start using their own identifiers because then standard tools aren't guaranteed to work properly.

ADD COMMENT
1
Entering edit mode

So, as a follow up. Is it true that, when writing out the SAM/BAM files, read mappers uniformly remove these extra identifiers? Specifically, is it valid to assume that in a BAM file, read1 and read2 will always have exactly the same QNAME?

ADD REPLY
0
Entering edit mode

If I understand correctly, that is a specification of the SAM format, which would explain why alignment programs output this format. Hopefully, someone that knows more than I do will comment to clarify this.

ADD REPLY
0
Entering edit mode

I encountered the same problem. After STAR alignment, the BAM files output the same name (identifier) for pair-end reads. The removal of extra identifiers to differentiate pair-end reads is perplexing. There is no purpose to remove the extra identifers.

ADD REPLY

Login before adding your answer.

Traffic: 2644 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6