All reads with mapq value <= 1, htseq does not attribute counts to any feature
1
0
Entering edit mode
8.1 years ago
fhsantanna ▴ 610

I am doing an experiment of rnaseq. I have mapped Iontorrent reads (single end, 2 million reads, length average = 50) to a bacterial genome using bowtie2 and bbmap. Problem is that htseq does not attribute mapped reads to any gene, because all alignments have mapq scores =<1. Only if a utilize the option "-a 0" in htseq, I can see that the genes have mapped reads. I was reading throughout biostars that a mapq threshold should be set to 5, at least. Visually, I perceived that most of the alignments were on rRNA gene, demonstrating that the ribosomal reduction was not efficient. But, anyway, most of the genes contained aligned reads. So, is it acceptable to set a mapq score to 0? If not, what are my possibilities? How could I improve mapq scores?

rnaseq mapq • 3.5k views
ADD COMMENT
1
Entering edit mode

What is the reason for the low MAPQ scores? I think these scores tell you about how reliable the reads mapped to a certain position, and scores of 1 (or lower) means that the change is 80% (or more) that the read is wrongly positioned!!!

I don't know if it's worth continuing with these odds???

Read more about MAPQ here: http://www.acgt.me/blog/2014/12/16/understanding-mapq-scores-in-sam-files-does-37-42

ADD REPLY
0
Entering edit mode

I believe the mapq scores are low because reads are aligning to more than one point in the genome. However I do not understand why these scores are too low and how I can solve this problem.

ADD REPLY
2
Entering edit mode

The MAPQ score is defined by the PHRED scale:

Q=-10log10(P)

...where Q is the quality score (MAPQ) and P is the probability of being correct. So, if the mapper thinks a site has a 99% chance of being correct, it gets a mapq of 20. It the aligner thinks it has a 90% chance of being correct, it gets a mapq of 10. By definition, a mapq of 3 or lower has at most a 50% chance of being correct; and therefore, no read that aligns equally well to more than 1 location can ever have a mapq>3. A mapq of 0 is the minimum and it is very low, indicating that either the read aligned to many locations equally well, or that it did not match any location very well.

It would be great if you could easily see a number indicating how well a read aligns to a reference... but the SAM spec does not provide that in the required fields. Which is too bad. If you use BBMap's "idtag" flag, it will produce an optional tag (prefixed by "YI:") indicating the percent identity of the alignment. You will also need to use "ambig=all" to make it report multiple ambiguous alignments instead of just the first.

I don't know how htseq operates internally, but hopefully this will help.

ADD REPLY
1
Entering edit mode

I assume you have RNAseq data? Check this answer, and do the same: use BBDuk to check for rRNA contamination on your samples. You may also use SortMeRNA for this, I think it is slightly more sensitive, but a lot slower. Either way, you will know how bad your rRNA contamination was, and both programs output a "dirty" and a "clean" fastq files - you may then remap after removing the rRNA.

ADD REPLY
0
Entering edit mode

I filtered my reads using bbduk, at least ~50% of the reads are rRNA. However, this was not the problem, as you can see in my answer.

ADD REPLY
1
Entering edit mode

Dr. Simon Andrews just posted this blog article on mapq values as implemented by various aligners.

ADD REPLY
0
Entering edit mode
8.1 years ago
fhsantanna ▴ 610

I found a way to have better mapq scores using ion data. Here is the command for bowtie2: bowtie2 --local --very-sensitive-local -p 8 --mm -x example

Now I am able to count in htseq!

Source: https://ioncommunity.thermofisher.com/docs/DOC-7062

ADD COMMENT
0
Entering edit mode

Fair warning: The link above requires an account/login to view.

ADD REPLY
0
Entering edit mode

Yes, we have to subscribe to get access to the protocol.

ADD REPLY
0
Entering edit mode

What was your previous bowtie2 command? The command you posted seems pretty standard, I wonder what you were trying before.

ADD REPLY
0
Entering edit mode

Here is the previous command: bowtie2 -x example -U reads_1.fq -S eg1.sam

ADD REPLY

Login before adding your answer.

Traffic: 2607 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6