How can both pairs in paired-end sequencing data be aligned to the forward strand without reverse complemented.
2
0
Entering edit mode
18 months ago
kunh ▴ 10

SRR1972739.9835 131     AF086833        127     60      101M    =       285     159    ...
SRR1972739.9835 67      AF086833        285     60      85M16S  =       127     -159    ...


The one read has FLAG 131 and other read has 67. (FLAG 131 means that one pair is 'PAIRED,PROPER_PAIR,READ2' and FLAG 67 means that 'PAIRED,PROPER_PAIR,READ1' another is.)

I know that if one read is on the forward strand, the other should be on the reverse strand.

r1 ------->         forward
===========================
reverse        <-------- r2


So I think one read should have reverse-complement flag(0x10) and it is impossible for both pairs to have 0x10 (REVERSE) or 0x20 (MREVERSE) flag. Therefore, I think the alignment with FLAG

       0x10  0x20


and

       0x10  0x20


are possible and

       0x10  0x20


and

       0x10  0x20


are impossible. Am I wrong? Or is the data wrong? As I know, sometimes the mate sequence is attached to another chromosome, but I don't know how. I would appreciate it if you could give me the answer.

alignment next-gen sequencing sequence • 474 views
0
Entering edit mode

There's a few cases where you'll definitely want to check the alignments by hand. That said, I tend to regard "properly" with a bit of suspicion.

67/131 only trip read_is_paired/in_proper_pair and R1 | R2 with neither being "reverse strand", which has interesting implications about orientation. You're correct in that one or the other should be on the reverse strand but this clearly isn't marked here.

Also in addition to flag 67, do you see any flag 83 (read reverse) or 99 (mate reverse) ?

0
Entering edit mode

For my own data, lots of 83/163 (read paired, mapped in proper, reverse strand, first && read paired, mapped in proper, mate_reverse_second in pair) and 99/147 (paired/proper/reverse/second && paired/proper/mate_reverse/first); assembly with megahit, mapping with BWA.

I'm just parsing out the sam flags via

samtools view Sample.sorted.bam | awk '{print 2}' | sort | uniq -c ; which will let you look at the full distribution of flags detected in the bam file along with counts. May want to check which sets of flags are the majority in your set? 15954684 147 / 99 15945099 83/ 163 922556 141 / 77 365195 133 / 73 364682 89/ 165 345488 97 / 145 339474 161 / 81 96041 137 / 69 95644 129 / 65 95480 101 / 153 #read unmapped, mate-reverse, 95031 113 / 177 # read_reverse_strand and mate_reverse_strand & paired, but not proper ADD REPLY 0 Entering edit mode I have calculated the counts of each FLAG pairs (supplementary and secondary alignment are not included) using python. @flag (83, 163) count: 2957 83: PAIRED,PROPER_PAIR,REVERSE,READ1 163: PAIRED,PROPER_PAIR,MREVERSE,READ2 @flag (99, 147) count: 2865 99: PAIRED,PROPER_PAIR,MREVERSE,READ1 147: PAIRED,PROPER_PAIR,REVERSE,READ2 @flag (77, 141) count: 2725 77: PAIRED,UNMAP,MUNMAP,READ1 141: PAIRED,UNMAP,MUNMAP,READ2 @flag (67, 131) count: 1367 67: PAIRED,PROPER_PAIR,READ1 131: PAIRED,PROPER_PAIR,READ2 @flag (115, 179) count: 51 115: PAIRED,PROPER_PAIR,REVERSE,MREVERSE,READ1 179: PAIRED,PROPER_PAIR,REVERSE,MREVERSE,READ2 @flag (81, 161) count: 12 81: PAIRED,REVERSE,READ1 161: PAIRED,MREVERSE,READ2 @flag (97, 145) count: 10 97: PAIRED,MREVERSE,READ1 145: PAIRED,REVERSE,READ2 @flag (73, 133) count: 6 73: PAIRED,MUNMAP,READ1 133: PAIRED,UNMAP,READ2 @flag (121, 181) count: 4 121: PAIRED,MUNMAP,REVERSE,MREVERSE,READ1 181: PAIRED,UNMAP,REVERSE,MREVERSE,READ2 @flag (65, 129) count: 2 65: PAIRED,READ1 129: PAIRED,READ2 @flag (69, 137) count: 1 69: PAIRED,UNMAP,READ1 137: PAIRED,MUNMAP,READ2 I can confirm that the mates which are proper' and having only one of the 'reverse' or 'mreverse' (flag 83/163, 99/147) are dominant but also see that there are still too many 'proper' but neither 'reverse' nor 'mreverse' pairs (flag 67/131). .+ I used BWA-mem aligner with no optional parameters bwa memref $r1$r2 > res.sam

2
Entering edit mode
18 months ago

This could be be caused by a genomic rearrangement, or a miss-assembly of the reference sequence. For me, this would mean that the "PROPERLY_PAIRED" flag should not be set, but as @ctseto pointed out, the definition of "PROPERLY_PAIRED" is so different from one aligner to another that it is not much use unless you know the specific definition for the specific aligner.

0
Entering edit mode

I would assume 67 should be 83 or 99 (read reverse or mate read reverse) and 131 would be 147 or 163 (read reverse or mate reverse)

0
Entering edit mode

Thank you for reply. I have to understand the concepts of miss-assembly and genomic rearrangement. I think the genomic rearrangement can be the reason because the reference genome I used is viral genome (ebolavirus).

1
Entering edit mode
18 months ago
ctseto ▴ 280

I suspect a lot of this boils down to how the sam flags are assigned and spec'd by the aligner. Even the whole "properly" thing involves some handwavium.

1
Entering edit mode

I used BWA-MEM aligner with default option. I should inspect the bwa tool manual conscientiously. Thank you.

1
Entering edit mode

Same on my end, though my use case was mapping PE reads to the contigs which were generated from their de novo assembly. Definitely very strange...

1
Entering edit mode

Indeed, sometimes the mappings are a bit perplexing. One option if you have downstream plans is to pick out only the reads that have a particular combination of flags for your downstream analysis? It is tempting to just pick out the ones that are "mapped" or "unmapped" (mapped to your desired contigs, or not mapped to a contaminant when screening) and use the appropriate flags, but there's also a lot of granularity in exactly which flags you can pull out. I imagine some of it may be down to parameter choice...too many mismatches, or gaps, or whatnot may change a read from unmap but not map? Or if the insert length is exceptionally far, it may not be properly paired? Or if a sequence is present in its forward and reverse forms, you may get a mapping to the unexpected form even if it breaks expected pairing?