Hi, Recently I've been doing the alignment of WGBS data. After mapping to the genome using BSMAP, I found a lot of aligned reads are not properly mapped from the samtools flagstat results :
578132580 + 0 in total (QC-passed reads + QC-failed reads) 30679544 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 578132580 + 0 mapped (100.00% : N/A) 547453036 + 0 paired in sequencing 285250897 + 0 read1 262202139 + 0 read2 61623 + 0 properly paired (0.01% : N/A) 418641102 + 0 with itself and mate mapped 128811934 + 0 singletons (23.53% : N/A) 396746190 + 0 with mate mapped to a different chr 396746190 + 0 with mate mapped to a different chr (mapQ>=5)
According to the last 2 lines above, I had so many unpaired reads whose mate mapped to a different chr. I'm quite worried about this consequence. As far as I know , also from this post : filtering paired end mapped reads form SAM/BAM file , this may happen due to chromosomal rearrangements(e.g. in cancer samples), artifacts introduced in library prep. or poor mapping quality. But my samples are not from cancer cells or tissues, as well as the last line in flagstat results tells me the it's not due to the poor reads quality. So my question is should I remove or keep those unpaired(improperly mapped) reads ? What's the reason that so many unpaired reads exist ? Looking forward to your kindly help. Thank you so much !