Question: Should I keep or remove the alignment reads whose mate reads mapped to a different chr ?
0
gravatar for dingailuma
7 days ago by
dingailuma10
dingailuma10 wrote:

Hi, Recently I've been doing the alignment of WGBS data. After mapping to the genome using BSMAP, I found a lot of aligned reads are not properly mapped from the samtools flagstat results :

578132580 + 0 in total (QC-passed reads + QC-failed reads)
30679544 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
578132580 + 0 mapped (100.00% : N/A)
547453036 + 0 paired in sequencing
285250897 + 0 read1
262202139 + 0 read2
61623 + 0 properly paired (0.01% : N/A)
418641102 + 0 with itself and mate mapped
128811934 + 0 singletons (23.53% : N/A)
396746190 + 0 with mate mapped to a different chr
396746190 + 0 with mate mapped to a different chr (mapQ>=5)

According to the last 2 lines above, I had so many unpaired reads whose mate mapped to a different chr. I'm quite worried about this consequence. As far as I know , also from this post : filtering paired end mapped reads form SAM/BAM file , this may happen due to chromosomal rearrangements(e.g. in cancer samples), artifacts introduced in library prep. or poor mapping quality. But my samples are not from cancer cells or tissues, as well as the last line in flagstat results tells me the it's not due to the poor reads quality. So my question is should I remove or keep those unpaired(improperly mapped) reads ? What's the reason that so many unpaired reads exist ? Looking forward to your kindly help. Thank you so much !

ADD COMMENTlink modified 7 days ago by fishka20 • written 7 days ago by dingailuma10

Is it mate-pair dataset?

ADD REPLYlink written 7 days ago by e.rempel510

As you can see from this line :

547453036 + 0 paired in sequencing

I think it's a mate-pair dataset. :)

ADD REPLYlink written 7 days ago by dingailuma10

Not necessarily, pair-end and mate-pair are two different techniques. Your problem can be caused by library preparation, wrong aligner settings. Not sure about flagstat "different chr". If you check a pair of reads with the same ID, are they really mapped to two different chromosomes? If chromosome is the same, which distance you get between these reads?

ADD REPLYlink written 7 days ago by fishka20

What is the quality of your reference genome? Is it human, or some other?

ADD REPLYlink written 7 days ago by h.mon9.0k

The reference genome is downloaded from UCSC hg19 genome.

ADD REPLYlink written 6 days ago by dingailuma10

Honestly, it seems something unpaired the R1 and R2 reads. What were your processing steps before mapping? How did you remove adapters, trim quality, etc?

ADD REPLYlink written 6 days ago by h.mon9.0k
0
gravatar for fishka
7 days ago by
fishka20
fishka20 wrote:

Try to change insertion size parameter to 1000 bp, for example, and check alignment rate. And also it would be reasonable to check reads orientation. Most likely it will solve the problem.

ADD COMMENTlink written 7 days ago by fishka20

Thanks for your suggestions. Here I wonder that if the DNA fragments used for sequencing are no more than 500bp(without adapters), is it necessary to change the insertion size ? BSMAP default setting for insert size is 500 bp.

ADD REPLYlink written 6 days ago by dingailuma10

I do not understand the question. You have good alignment rate but pairs are aligned discordantly. It means, that most likely, your alignment parameters are wrong. Run samtools view ...bam | head And samtools view ...bam|grep... for 10 IDs. From pair coordinates you will see distance between pair of reads and their orientation. Sometimes just a small look at actual mapped pairs helps to clarify things.

ADD REPLYlink written 6 days ago by fishka20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1592 users visited in the last hour