Why some lines of BAM file do not contain NH tag
2
0
Entering edit mode
8.5 years ago
biolab ★ 1.4k

Hi, everyone,

I want to get uniquely mapped reads, so need to select the NH (number of hits) tag value that is equal to 1. However, why some lines of BAM file do not contain NH tag?

I appreciate any of your comments. Thank you very much!

bam NH-tag • 4.8k views
ADD COMMENT
1
Entering edit mode
8.5 years ago

NH is an optional tag, not all aligners will include it (and those that do will often omit it for reads with only a single hit). Anyway, filter by MAPQ instead.

ADD COMMENT
0
Entering edit mode

Hi Devon, thanks a lot for your answer. I have one further question: MAPQ is a parameter to indicate the quality of mapping reads. Is there publication that use MAPQ filtering method? To my knowledge, unique mapping strategy has been well documented, for instance, http://www.sciencedirect.com/science/article/pii/S1534580715000362. This is the reason I want to get uniquely mapped reads rather than MAPQ filtering. I appreciate any of your comments. THANK YOU!

ADD REPLY
1
Entering edit mode

"Uniquely mapped" is far from well documented, it doesn't even have a single definition.

ADD REPLY
0
Entering edit mode

Hi, Devon, Thanks for your comment. I think it's useful, although I am a bit confused. I need to learn more. THANKS.

ADD REPLY
2
Entering edit mode

Maybe worth keeping in mind that you can align any read anywhere in the genome, you just need to allow enough mismatches. So there is no such thing as uniquely mapped, strictly speaking. However, you can ask whether the best hit has alignment score sufficiently higher then the second best to say that the read is "uniquely mapped". Effectively this is what you would do if you filter by mapq, as Devon suggests.

ADD REPLY
0
Entering edit mode

Thanks, dariober. My understanding is: if mapping quality is low, "uniquely mapped" reads becomes useless. So, setting a MAPQ threshold guarantee high-quality reads. Using "multiple mapped" or "uniquely mapped" is not a good criterion to assess gene expressions. Above is my understanding. I hope to catch your idea. THANKS anyway.

ADD REPLY
2
Entering edit mode

An important nuance is that MAPQ is heavily affected not just by the correctness of a read's sequence but also by how well the sample you sequenced happens to match the reference you're aligning against. For example, if you sequenced a human you should expect a fair number of lowish MAPQ but none-the-less absolutely correct alignments due to SNPs.

ADD REPLY
1
Entering edit mode
8.5 years ago
Ian 6.0k

Hi, we recently discussed the uniquely mapped reads question: C: Bowtie 2 - is there a way to discard reads mapping to multiple locations?

The XS:i flag is sometimes used, which gives the quality of the next best matching position. So if this is present it is not a uniquely mapped read. But like Devon says it is probably best to get off the uniquely mapped reads train.

ADD COMMENT
0
Entering edit mode

Thanks, Ian, your previous post and answer is very helpful!

ADD REPLY

Login before adding your answer.

Traffic: 1974 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6