Same strand, same QNAME different FLAG?
2
0
Entering edit mode
7 weeks ago
compuTE ▴ 80

In an attempt to split an original BAM file into two forward and reverse BAM files following this tutorial I bumped into a question. I am using paired end data.

This is an example of the alignments I have in the reverse strand BAM file:

A00681:387:HCVJTDRXY:1:2245:2419:34898 99 chr3 85295832 255 139M = 85295832 139 ACGAGTTAGTGGGTGCAACACACCAGCATGGCACATGTATACATATGTAACTAACCTGCACAATGTGCACAAGTACCCTAAAACTTAAAGTATAATAAAAAAAAGAAAAAAAGAAGAAGAAAAAAAAAGTAGTAAACCT FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F:FFF:F:::FFFFFF:F,FFF,F:FFFFFF:,FFFF:F,FF NH:i:1 HI:i:1 AS:i:274 nM:i:1 NM:i:0 MD:Z:139 jM:B:c,-1 jI:B:i,-1 rB:B:i,1,139,85295832,85295970 MC:Z:139M

A00681:387:HCVJTDRXY:1:2245:2419:34898 147 chr3 85295832 255 139M = 85295832 -139 ACGAGTTAGTGGGTGCAACACACCAGCATGGCACATGTATACATATGTAACTAACCTGCACAATGTGCACAAGTACCCTAAAACTTAAAGTATAATAAAAAAAAGAAAAAAAGAAGAAGAAAAAAAAAGTACTAAACCT FFF,FFFFFFFFF:FFFFF::,,FFF,FF,F,F,F:FFFFFFFFFFFFFFFFFFFF,FF:FFFF:F:,FFFF,:FFF:FFFFFF:FFFF,:FFFFFFF,FFFFF:FFFFFFFFFF,,FFFFFFFFFFFFFF,FFF,FFF NH:i:1 HI:i:1 AS:i:274 nM:i:1 NM:i:1 MD:Z:131G7 jM:B:c,-1 jI:B:i,-1 rB:B:i,141,279,85295832,85295970 MC:Z:139M

They have the same QNAME. Different flags:

147:
read mapped in proper pair (0x2)
second in pair (0x80)

99:
read mapped in proper pair (0x2)
mate reverse strand (0x20)
first in pair (0x40)


Maybe I have not understand correctly the meaning of QNAME, but are these the same read?? If so why do they have different flags and 1 nucleotide of a difference (in bold)??

From the SAM format manual: "Reads/segments having identical QNAME are regarded to come from the same template." What exactly they mean by template?

In my quest to understand this, I ran featureCounts two times one with -s 2 (it's a reversely stranded library) and -s 0. -s 0 gives me double counts. I am not sure how to interpret this... Any help?

Thank you!!

strand featureCounts SAM RNAseq • 384 views
1
Entering edit mode
7 weeks ago
compuTE ▴ 80

For anyone that comes here wondering: As swbarnes2 said, these two reads were mates of a paired-end fragment.

A couple of people helped me figuring out how to correctly quantify things with featureCounts, if you happen to split your BAM file by strand as me :-). I summarized some of my tests and thoughts here. Hope it's helpful for anyone with similar questions!

0
Entering edit mode
7 weeks ago

No, of course they aren't the same read. The first is R2, the second is R1. That's what first and second in pair means. And of course they come from the same template molecule; they are sequenced from opposite ends of that template molecule.

I have no idea at all what you mean by a "reverse strand bam file".

0
Entering edit mode

Hi, thank you for the answer. As you can see, I have trouble with the very basics here, please bare with me :-). Sorry if i was not clear enough with what I meant with "reverse strand bam file" - From the tutorial I linked "Here is our script that splits a bam file into two fwd.bam and rev.bam each containing the alignments that are in the transcriptional direction." This is exactly what I am after, a BAM file containing the reads on a reverse transcriptional direction. It kinda hit me right now that this has nothing to do with R1/2. Both mates should come from the same direction (right?). My question is then, does the correct way of quantifying features from this file would be with -s 2, right? Otherwise I would be quantifying each mate and that's why I get double counts with -s 0. And the mismatch between the two sequences could be explained by a lack of precision towards the end of the reads (right?). Again, thank you. It's easier to understand by simply bouncing the idea.

1
Entering edit mode

Both mates should come from the same direction (right?).

See: Insert Size And Fragment Size ? Top answer should clarify this. They come from the same fragment and sample the two strands in 5'-->3' direction.

0
Entering edit mode

Yes, so it's the same fragment (which was transcribed in either + or - direction), read from two directions, one of the mates reading the complementary. A super oversimplification of this - let's forget for a second about PCR duplicates - if we would have a single molecule of a transcript, the correct quantification would be 1, as featureCounts would understand from the flags in the BAM file that this is 1 pair of reads, correct? If so, then how does -s play a role here? why do I see a difference between using -s 0 and -s 2?