Question: All reads with mapq value <= 1, htseq does not attribute counts to any feature
0
gravatar for fhsantanna
2.1 years ago by
fhsantanna390
Brazil
fhsantanna390 wrote:

I am doing an experiment of rnaseq. I have mapped Iontorrent reads (single end, 2 million reads, length average = 50) to a bacterial genome using bowtie2 and bbmap. Problem is that htseq does not attribute mapped reads to any gene, because all alignments have mapq scores =<1. Only if a utilize the option "-a 0" in htseq, I can see that the genes have mapped reads. I was reading throughout biostars that a mapq threshold should be set to 5, at least. Visually, I perceived that most of the alignments were on rRNA gene, demonstrating that the ribosomal reduction was not efficient. But, anyway, most of the genes contained aligned reads. So, is it acceptable to set a mapq score to 0? If not, what are my possibilities? How could I improve mapq scores?

mapq rnaseq • 1.0k views
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by fhsantanna390
1

What is the reason for the low MAPQ scores? I think these scores tell you about how reliable the reads mapped to a certain position, and scores of 1 (or lower) means that the change is 80% (or more) that the read is wrongly positioned!!!

I don't know if it's worth continuing with these odds???

Read more about MAPQ here: http://www.acgt.me/blog/2014/12/16/understanding-mapq-scores-in-sam-files-does-37-42

ADD REPLYlink written 2.1 years ago by b.nota3.7k

I believe the mapq scores are low because reads are aligning to more than one point in the genome. However I do not understand why these scores are too low and how I can solve this problem.

ADD REPLYlink written 2.1 years ago by fhsantanna390
2

The MAPQ score is defined by the PHRED scale:

Q=-10log10(P)

...where Q is the quality score (MAPQ) and P is the probability of being correct. So, if the mapper thinks a site has a 99% chance of being correct, it gets a mapq of 20. It the aligner thinks it has a 90% chance of being correct, it gets a mapq of 10. By definition, a mapq of 3 or lower has at most a 50% chance of being correct; and therefore, no read that aligns equally well to more than 1 location can ever have a mapq>3. A mapq of 0 is the minimum and it is very low, indicating that either the read aligned to many locations equally well, or that it did not match any location very well.

It would be great if you could easily see a number indicating how well a read aligns to a reference... but the SAM spec does not provide that in the required fields. Which is too bad. If you use BBMap's "idtag" flag, it will produce an optional tag (prefixed by "YI:") indicating the percent identity of the alignment. You will also need to use "ambig=all" to make it report multiple ambiguous alignments instead of just the first.

I don't know how htseq operates internally, but hopefully this will help.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Brian Bushnell15k
1

I assume you have RNAseq data? Check this answer, and do the same: use BBDuk to check for rRNA contamination on your samples. You may also use SortMeRNA for this, I think it is slightly more sensitive, but a lot slower. Either way, you will know how bad your rRNA contamination was, and both programs output a "dirty" and a "clean" fastq files - you may then remap after removing the rRNA.

ADD REPLYlink written 2.1 years ago by h.mon13k

I filtered my reads using bbduk, at least ~50% of the reads are rRNA. However, this was not the problem, as you can see in my answer.

ADD REPLYlink written 2.1 years ago by fhsantanna390
1

Dr. Simon Andrews just posted this blog article on mapq values as implemented by various aligners.

ADD REPLYlink written 2.1 years ago by genomax46k
0
gravatar for fhsantanna
2.1 years ago by
fhsantanna390
Brazil
fhsantanna390 wrote:

I found a way to have better mapq scores using ion data. Here is the command for bowtie2: bowtie2 --local --very-sensitive-local -p 8 --mm -x example

Now I am able to count in htseq!

Source: https://ioncommunity.thermofisher.com/docs/DOC-7062

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by fhsantanna390

Fair warning: The link above requires an account/login to view.

ADD REPLYlink written 2.1 years ago by genomax46k

Yes, we have to subscribe to get access to the protocol.

ADD REPLYlink written 2.1 years ago by fhsantanna390

What was your previous bowtie2 command? The command you posted seems pretty standard, I wonder what you were trying before.

ADD REPLYlink written 2.1 years ago by h.mon13k

Here is the previous command: bowtie2 -x example -U reads_1.fq -S eg1.sam

ADD REPLYlink written 2.1 years ago by fhsantanna390
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 970 users visited in the last hour