Question: Alignments not labelled as proper pair on bwa mem
0
gravatar for lastenigma1805
3 months ago by
lastenigma18050 wrote:

I used bwa mem to perform a paired alignment of two fastq files. The resulting bam file would have been used to generate an mpileup and finally a consensus sequence. I noticed that for this alignment, many paired and seemingly properly mated pairs were not given the proper pair flag on the resultant sam file. Would anyone know why something like this might happen and is there a way to fix this?

The following alignments illustrate the problem. Both reads are paired, and each individually map onto the exact same location on the reference, with the exact same cigar string. However, neither read has the properly paired flag (2)

M02670:237:000000000-CWY49:1:1101:18506:10667   97  H3N2_NA 1   60  27S235M =   1   235 GCGTGATCAGCAAAAGCAGGAGTAAAGATGAATCCAAATCAAAAGATAATAACGATTGGCTCTGTTTCTCTCACCATTTCCACAATATGCTTCTTCATGCAAATTGCCATCCTGATGACTACTGTAACATTGCATTTCAAGCAATATGAATTCAACTCCCCCCCAAACAACCAAGTGATGCTGTGTGAACCAACAATGATAGAAAGAAACATAACAGAGATAGTGTATTTGACCAACACCACCATAGAGAGGGAAATATGCC  CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGFFGFGGGGFGFFFADEFFFFFFFFFF  NM:i:6  MD:Z:47A41A50T29A52A4G6MC:Z:27S235M AS:i:217    XS:i:0  RG:Z:27-A_Macedonia_648_2020_S27
M02670:237:000000000-CWY49:1:1101:18506:10667   145 H3N2_NA 1   60  27S235M =   1   -235    GCGTGATCAGCAAAAGCAGGAGTAAAGATGAATCCAAATCAAAAGATAATAACGATTGGCTCTGTTTCTCTCACCATTTCCACAATATGCTTCTTCATGCAAATTGCCATCCTGATGACTACTGTAACATTGCATTTCAAGCAATATGAATTCAACTCCCCCCCAAACAACCAAGTGATGCTGTGTGAACCAACAATGATAGAAAGAAACATAACAGAGATAGTGTATTTGACCAACACCACCATAGAGAGGGAAATATGCC  ?FFFFFD>FFFFFFF?BA:FFFFFFFFFFFFFFFFFFFDFFFFFFBFFFFFFFFFFFGF>DGGGGGGGGGGGGGFGGF>GFGGGGDFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC  NM:i:6  MD:Z:47A41A50T29A52A4G6MC:Z:27S235M AS:i:217    XS:i:0  RG:Z:27-A_Macedonia_648_2020_S27

I am aware that in the generation of the mpileup, the -A option can be used to ignore this flag, but I am worried about resultant false positives from doing this. Furthermore, I would like a more general solution as this is alignment was output as part of a processing pipeline that processes many more samples.

software error • 182 views
ADD COMMENTlink written 3 months ago by lastenigma18050

samtools fixmate?

ADD REPLYlink written 3 months ago by genomax89k

Not sure what the right answer is here though, both reads have the same sequence, both have 27 soft clipped bases, the actual fragment length is not 235 for that reason.

In general the precise rationale of what is called proper pair is not well defined. It is a "proper pair" only according to bwa and for reasons that are never explained. Of course, that knowledge is of little help if the next tool makes use of proper pair information.

I would recommend avoiding the use of this flag, it is an outdated, obsolete and plain wrong concept rooted in a time where sequencing was terrible and one needed crutches like this to move ahead.

Instead you should filter your bam file for alignments that fulfill certain properties, primary alignment, above a certain quality etc then use the all flag. Otherewise the many hidden assumptions that mpileup makes will never be clear.

ADD REPLYlink written 3 months ago by Istvan Albert ♦♦ 84k

also likely that this problem manifests itself because the reads are completely identical and overlapping

ADD REPLYlink written 3 months ago by Istvan Albert ♦♦ 84k

I don't think they are identical, because one is forward and one is reverse. Looks like the read length > fragment length.

ADD REPLYlink written 3 months ago by swbarnes28.6k

what I meant completely overlapping (the fragment is shorter than the read lenght) hence both reads in the pair align to the exact same interval on different strands.

upon more discussion on slack we believe that the cause of not having the proper pair set is that in this case bwa can't quite decide whether the reads are in the correct orientation --> <-- or in <-- --> hence it won't mark them as proper pair.

the simplest workaround is to use a different aligner like bowtie2 or bbmap

perhaps samtools fixmate also works - would be good to know.

ADD REPLYlink modified 3 months ago • written 3 months ago by Istvan Albert ♦♦ 84k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1698 users visited in the last hour