Question

Should I keep or remove the alignment reads whose mate reads mapped to a different chr ?

0

Entering edit mode

6.5 years ago

dingailuma ▴ 20

Hi, Recently I've been doing the alignment of WGBS data. After mapping to the genome using BSMAP, I found a lot of aligned reads are not properly mapped from the samtools flagstat results :

578132580 + 0 in total (QC-passed reads + QC-failed reads)
30679544 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
578132580 + 0 mapped (100.00% : N/A)
547453036 + 0 paired in sequencing
285250897 + 0 read1
262202139 + 0 read2
61623 + 0 properly paired (0.01% : N/A)
418641102 + 0 with itself and mate mapped
128811934 + 0 singletons (23.53% : N/A)
396746190 + 0 with mate mapped to a different chr
396746190 + 0 with mate mapped to a different chr (mapQ>=5)

According to the last 2 lines above, I had so many unpaired reads whose mate mapped to a different chr. I'm quite worried about this consequence. As far as I know , also from this post : filtering paired end mapped reads form SAM/BAM file , this may happen due to chromosomal rearrangements(e.g. in cancer samples), artifacts introduced in library prep. or poor mapping quality. But my samples are not from cancer cells or tissues, as well as the last line in flagstat results tells me the it's not due to the poor reads quality. So my question is should I remove or keep those unpaired(improperly mapped) reads ? What's the reason that so many unpaired reads exist ? Looking forward to your kindly help. Thank you so much !

alignment properly mapped reads bsmap • 4.0k views

ADD COMMENT • link updated 6.5 years ago by fishka ▴ 30 • written 6.5 years ago by dingailuma ▴ 20

0

Entering edit mode

Is it mate-pair dataset?

ADD REPLY • link 6.5 years ago by e.rempel ★ 1.1k

0

Entering edit mode

As you can see from this line :

547453036 + 0 paired in sequencing

I think it's a mate-pair dataset. :)

ADD REPLY • link 6.5 years ago by dingailuma ▴ 20

1

Entering edit mode

Not necessarily, pair-end and mate-pair are two different techniques. Your problem can be caused by library preparation, wrong aligner settings. Not sure about flagstat "different chr". If you check a pair of reads with the same ID, are they really mapped to two different chromosomes? If chromosome is the same, which distance you get between these reads?

ADD REPLY • link 6.5 years ago by fishka ▴ 30

0

Entering edit mode

What is the quality of your reference genome? Is it human, or some other?

ADD REPLY • link 6.5 years ago by h.mon 35k

0

Entering edit mode

The reference genome is downloaded from UCSC hg19 genome.

ADD REPLY • link 6.5 years ago by dingailuma ▴ 20

0

Entering edit mode

Honestly, it seems something unpaired the R1 and R2 reads. What were your processing steps before mapping? How did you remove adapters, trim quality, etc?

ADD REPLY • link 6.5 years ago by h.mon 35k

score 0 · Answer 1 · 2017-10-12

0

Entering edit mode

6.5 years ago

fishka ▴ 30

Try to change insertion size parameter to 1000 bp, for example, and check alignment rate. And also it would be reasonable to check reads orientation. Most likely it will solve the problem.

ADD COMMENT • link 6.5 years ago by fishka ▴ 30

0

Entering edit mode

Thanks for your suggestions. Here I wonder that if the DNA fragments used for sequencing are no more than 500bp(without adapters), is it necessary to change the insertion size ? BSMAP default setting for insert size is 500 bp.

ADD REPLY • link 6.5 years ago by dingailuma ▴ 20

0

Entering edit mode

I do not understand the question. You have good alignment rate but pairs are aligned discordantly. It means, that most likely, your alignment parameters are wrong. Run samtools view ...bam | head And samtools view ...bam|grep... for 10 IDs. From pair coordinates you will see distance between pair of reads and their orientation. Sometimes just a small look at actual mapped pairs helps to clarify things.

ADD REPLY • link 6.5 years ago by fishka ▴ 30