Hi,
I am dealing with RNA-seq PE data. I see some of my samples have poor quality bases at the end of reverse reads. How does TopHat deal with such reads? Are these bases clipped while mapping? and how does that affect the mapping quality?
Thank you
Hi,
I am dealing with RNA-seq PE data. I see some of my samples have poor quality bases at the end of reverse reads. How does TopHat deal with such reads? Are these bases clipped while mapping? and how does that affect the mapping quality?
Thank you
Az Arum puts it (and that should be an answer rather than a comment ;-) ) very few tools use qualities directly. Rightly so I might add since the way the base quality measures are generated lacks proper foundation - at least with respect to the numerical probabilities they stand for.
Note how a good base quality is 40 that means one in 10,000 chance of being wrong - yet at the same time just about all sequencing platforms introduce about 1 miscall per 100 bases. Trimming back reads from their ends prior to processing is the most common approach.
In case if I don't trim the poor quality bases, then how does tophat deal with them?
I think (but you should check this with the developers) that it ignores the quality during the alignment procedure, but then it does make use of it when computing the mapping quality is computed, at least this is what maq/bwa does: http://maq.sourceforge.net/qual.shtml - an now just a personal opinion - in general don't read too much into the qualities - these as values are rough approximations that have an accuracy that is far less than what is implied
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
AFAIK tophat does not perform soft or hard clipping. So, you'll get better read mapping if you clip the low-quality bases yourself. The way I do it is to perform clipping from the end of the read and retain the read if the clipped read length is >= 50 bases, else remove it.