Question: Should We Dump Illumina Pair-End Mapping Results In Sam With Mapq=0, But Good Template Length
0
gravatar for PeterPan
6.7 years ago by
PeterPan30
ShangHai
PeterPan30 wrote:

hi, everyone! I am working on illumina pair-end sequencing

After mapping by bwa, I got a pair of reads with MAPQ=0, with both reads mapped to more than one place. But the Template Length is OK, and I find this pair of reads in this sam file only once.

So, should we dump this mapping pair?

SPECIFIC_ID 83 Y 58900709 0 92M = 58900679 -122 AGTGCATTCCATTCCAGTCTCTTCAGTTCGATTCCATTCCATTCGTTTCGATTCCTTTCCATTCCAGCCCATTCCATTCCATTGCATTCCTT DDDCDCDDD DDDDDDDDDDDDEEC?FFFFFHHHHHJJIJIHJIJJJJIJJJJJJIJHGJJIHFJJJIGIJJIHIJIJHJJIJJIJJIFHFFH XT:A:R NM:i:1 SM:i:0 AM:i:0 X0 :i:9 X1:i:16 XM:i:1 XO:i:0 XG:i:0 MD:Z:83C8 :i:9 X1:i:16 XM:i:1 XO:i:0 XG:i:0 MD:Z:83C8

SPECIFIC_ID 163 Y 58900679 0 92M = 58900709 122 AGAACCTTCCATTACACTCCCTTCCATTCCAGTGCATTCCATTCCAGTCTCTTCAGTTCGATTCCATTCCATTCGTTTCGATTCCTTTCCAT FHGHHIIJJ JGJIIJIJJIIJJIIJEIGEGHIIJJIJIGIIIF<fgiijjhjieejijiiijhehehigggghegghjc=ehf@deef?>CD XT:A:R NM:i:0 SM:i:0 AM:i:0 X0 :i:2 X1:i:18 XM:i:0 XO:i:0 XG:i:0 MD:Z:92

Thanks very much!

illumina paired-end length sam • 2.0k views
ADD COMMENTlink modified 6.7 years ago by Gabriel R.2.6k • written 6.7 years ago by PeterPan30
0
gravatar for Gabriel R.
6.7 years ago by
Gabriel R.2.6k
Center for Geogenetik Københavns Universitet
Gabriel R.2.6k wrote:

What do you mean dump ? If you do genotyping, the genotyper will consider the mapping quality anyway. People shouldn't "dump" data, leave everything in there and flag it as QC failed.

ADD COMMENTlink written 6.7 years ago by Gabriel R.2.6k

Thanks Samuel! Usually I will filter sam file by myself, often by MAPQ and Template Length. So you mean if I leave it there, the calling quality would contain mapping quality information?

ADD REPLYlink written 6.7 years ago by PeterPan30

Yes, in GATK for example you have the RMS of the mapping quality supporting that site in the MQ field. You should subsequently filter your sites according to this (and other metrics :-) )

ADD REPLYlink written 6.7 years ago by Gabriel R.2.6k

OK, I got it. Thanks!

ADD REPLYlink written 6.7 years ago by PeterPan30

I had the same confusion when I started with NGS analysis. I used to filter out unmapped and ambiguosly mapped reads. But then I realised that most of the downstream tools including samtools and GATK make use of mapping quality and other features like concordancy in the mapping of paired reads to make sure that the SNPs and Indels generated are from reads that are confidently placed on the genome. Moreover, these anomalous reads will be important as they are used by some programs to identify large indels. For example, Pindel uses read pairs where one read was confidently aligned but the other read couldnt be aligned. It then takes the other unaligned read from the BAM file and try to look for insertion and deletion. So at any point you should have all these information in the bam file. I dont filter out anything from the BAM file.

ADD REPLYlink modified 6.7 years ago • written 6.7 years ago by Ashutosh Pandey11k

Thanks ashutoshmits! Base on your statement, we should not filter bam by MAPQ. But how about Template Length? We know that reads length should be within some range based on Gel electrophoresis.

ADD REPLYlink written 6.7 years ago by PeterPan30

I have never seen someone taking care of he template length. I dont mean that what you are trying to do is wrong but I assume if you are using standard platform for sequencing including SOLiD , Illumina then you normally get reads with specified size. Even if there are junk reads they wont get aligned with higher confidence or at all. BTW, I just read your original question. You can throw the reads in case both the reads from the same pair are not aligned. They wont be useful at all but there is no problem in keeping them unless you want to reduce your BAM size little bit.

ADD REPLYlink written 6.7 years ago by Ashutosh Pandey11k

I see. Thanks a lot!

ADD REPLYlink written 6.7 years ago by PeterPan30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 777 users visited in the last hour