Should We Dump Illumina Pair-End Mapping Results In Sam With Mapq=0, But Good Template Length
1
0
Entering edit mode
11.3 years ago
PeterPan ▴ 30

hi, everyone! I am working on illumina pair-end sequencing

After mapping by bwa, I got a pair of reads with MAPQ=0, with both reads mapped to more than one place. But the Template Length is OK, and I find this pair of reads in this sam file only once.

So, should we dump this mapping pair?

SPECIFIC_ID 83 Y 58900709 0 92M = 58900679 -122 AGTGCATTCCATTCCAGTCTCTTCAGTTCGATTCCATTCCATTCGTTTCGATTCCTTTCCATTCCAGCCCATTCCATTCCATTGCATTCCTT DDDCDCDDD DDDDDDDDDDDDEEC?FFFFFHHHHHJJIJIHJIJJJJIJJJJJJIJHGJJIHFJJJIGIJJIHIJIJHJJIJJIJJIFHFFH XT:A:R NM:i:1 SM:i:0 AM:i:0 X0 :i:9 X1:i:16 XM:i:1 XO:i:0 XG:i:0 MD:Z:83C8 :i:9 X1:i:16 XM:i:1 XO:i:0 XG:i:0 MD:Z:83C8

SPECIFIC_ID 163 Y 58900679 0 92M = 58900709 122 AGAACCTTCCATTACACTCCCTTCCATTCCAGTGCATTCCATTCCAGTCTCTTCAGTTCGATTCCATTCCATTCGTTTCGATTCCTTTCCAT FHGHHIIJJ JGJIIJIJJIIJJIIJEIGEGHIIJJIJIGIIIF<fgiijjhjieejijiiijhehehigggghegghjc=ehf@deef?>CD XT:A:R NM:i:0 SM:i:0 AM:i:0 X0 :i:2 X1:i:18 XM:i:0 XO:i:0 XG:i:0 MD:Z:92

Thanks very much!

illumina paired-end sam length • 3.3k views
ADD COMMENT
0
Entering edit mode
11.3 years ago
Gabriel R. ★ 2.9k

What do you mean dump ? If you do genotyping, the genotyper will consider the mapping quality anyway. People shouldn't "dump" data, leave everything in there and flag it as QC failed.

ADD COMMENT
0
Entering edit mode

Thanks Samuel! Usually I will filter sam file by myself, often by MAPQ and Template Length. So you mean if I leave it there, the calling quality would contain mapping quality information?

ADD REPLY
0
Entering edit mode

Yes, in GATK for example you have the RMS of the mapping quality supporting that site in the MQ field. You should subsequently filter your sites according to this (and other metrics :-) )

ADD REPLY
0
Entering edit mode

OK, I got it. Thanks!

ADD REPLY
0
Entering edit mode

I had the same confusion when I started with NGS analysis. I used to filter out unmapped and ambiguosly mapped reads. But then I realised that most of the downstream tools including samtools and GATK make use of mapping quality and other features like concordancy in the mapping of paired reads to make sure that the SNPs and Indels generated are from reads that are confidently placed on the genome. Moreover, these anomalous reads will be important as they are used by some programs to identify large indels. For example, Pindel uses read pairs where one read was confidently aligned but the other read couldnt be aligned. It then takes the other unaligned read from the BAM file and try to look for insertion and deletion. So at any point you should have all these information in the bam file. I dont filter out anything from the BAM file.

ADD REPLY
0
Entering edit mode

Thanks ashutoshmits! Base on your statement, we should not filter bam by MAPQ. But how about Template Length? We know that reads length should be within some range based on Gel electrophoresis.

ADD REPLY
0
Entering edit mode

I have never seen someone taking care of he template length. I dont mean that what you are trying to do is wrong but I assume if you are using standard platform for sequencing including SOLiD , Illumina then you normally get reads with specified size. Even if there are junk reads they wont get aligned with higher confidence or at all. BTW, I just read your original question. You can throw the reads in case both the reads from the same pair are not aligned. They wont be useful at all but there is no problem in keeping them unless you want to reduce your BAM size little bit.

ADD REPLY
0
Entering edit mode

I see. Thanks a lot!

ADD REPLY

Login before adding your answer.

Traffic: 1532 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6