Inconsistent read pair?
1
0
Entering edit mode
8.9 years ago
nenewell • 0

I have come across the following read pair in a .sam file:

read1 163 ref 100 255 10M =   101 11  AAAAAAAAAAA AAAAAAAAAAA zz:Z:fr
read2 83  ref 101 255 10M =   100 -11 AAAAAAAAAAA AAAAAAAAAAA zz:Z:fr

I'm new to SAM/BAM files, and to me it seems to be inconsistent. The start positions, CIGAR lengths, and the facts that FLAG 163 means a forward read while FLAG 83 means a reverse read indicate this situation:

+++++++++>-    read1  163
-<+++++++++    read2  83

The negative sign on TLEN (-11) is also consistent with read2 being the second read, according to the SAM file spec for TLEN signs.

However, FLAG 163 also means that read1 is the second read and FLAG 83 means that read2 is the first read, so there appears to be an inconsistency. What am I missing? Judging by the positions, isn't read1 clearly the first read? This pair is apparently from an FR library, so read1 should be first while read2 is second.

alignment sequencing SAM • 3.3k views
ADD COMMENT
0
Entering edit mode

Thanks! I think the language in the SAM file spec. is misleading here - "first segment in the template", "last segment in the template". Something like "first segment sequenced from a fragment", "last segment sequenced from a fragment" would be much better.

ADD REPLY
3
Entering edit mode
8.9 years ago

"First read" doesn't mean that the read should be the first mapped on the genome at 5'. It means, first read in pair: In paired-read sequencing, a sequencer will sequence both sides of a DNA fragment and produce two FASTQ files for each fragment .

"First read in pair" means the read comes from the first FASTQ file, "Second read in pair" means the read comes from the second FASTQ file.

ADD COMMENT

Login before adding your answer.

Traffic: 1578 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6