Question: Would Tophat place 3-5 mismatches in a row if I raised default mismatches allowed?
0
gravatar for james.lloyd
3.7 years ago by
james.lloyd80
United States
james.lloyd80 wrote:

So I have some very long RNA-seq reads (250nt) and I thought of upping the number of allowed mismatches. 2 is the default I used for 100nt but I thought of going to 5 for 250nt reads (1mismatch/50nt). I will be using Tophat to map these reads. 

 

I am concerned Tophat would put >=3 mismatches in a row (nt next to each other) and I would like to stop that from happening so what I would like to know is if Tophat (and Bowtie) would map such a read and if so, are there any changes in its settings to stop that (other than keeping the mismatches set to 2 as default)? 

 

If I cannot stop this, is there an easy to way to filter such reads out from the BAM/SAM file? 

Many thanks,

James

rna-seq tophat • 969 views
ADD COMMENTlink modified 3.7 years ago by Istvan Albert ♦♦ 79k • written 3.7 years ago by james.lloyd80
0
gravatar for Istvan Albert
3.7 years ago by
Istvan Albert ♦♦ 79k
University Park, USA
Istvan Albert ♦♦ 79k wrote:

The proper way to think about this is to understand that alignments are chosen to maximize a numerical score that is built from the positive values for match and negative values (penalties) for mismatch, gap open and gap extension. The number of matches or mismatches that you allow are not relevant here. Those may be used filter the alignments but do not factor into generating the alignments.

If you don't want two or more mismatches in a row than all you need to do is to ensure that you instruct the aligner that two mismatches should score worse than gap open + extension. Which is probably the default setting anyway.

ADD COMMENTlink written 3.7 years ago by Istvan Albert ♦♦ 79k

Thank you for your reply. I am trying to compare what you said with the options in the Tophat manual. I have selected these options for my run allowing for 5 mismatches. Do you know if these setting will prohibit >2 mismatches in a row? I think the read-gap-length will do this but I am not sure if read-edit-dist interferes with that. 

--read-mismatches 5 (default 2)

--read-gap-length (left as default 2) 

--read-edit-dist 5 (default 2; I had to change it to 5 when I increased read-mismatches)

 

Thanks again,

James

ADD REPLYlink written 3.7 years ago by james.lloyd80

I think you are overly hung up of the being afraid of two mismatches in a row. Imagine that your data actually comes from a sample that actually has two mismatches in a row - why would you not want that to be reported correctly? It would scientifically be inappropriate to forbid this to happen a-priori. In general it is rare to get multiple mismatches in a row by accident since the mismatch penalties are typically  higher than gap open + extension so some other alternative alignment will be found. But I would recommend to move on and stop being concerned about something that rarely happens and when it does happen is probably correct anyhow.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Istvan Albert ♦♦ 79k

I am fine with 2 mismatches in a row. It is more than 2 mismatches in a row that troubled my lab so I am trying to see if my mapping approach as described above would stop 3 or more mismatches in a row. I also do not think it would be bad to have some rare cases where there are multiple mismatches in a row but wanted to understand what I had done better and to see if this fear my lab had was even real which I am still having trouble seeing if it is. 

ADD REPLYlink written 3.7 years ago by james.lloyd80

2 or 3 or 4 or 5 makes no difference - and like I said the way is not to filter out or forbid it from happening - if your aligner reports three mismatches in a row than it means that is the most likely alignment based on what parameters you have set. And that's that, the way around it is not to filter out just this one thing but allow all others. It would be pretty absurd (and bad science) to remove three mismatches in a row but allow three mismatches as long as there is one base separation between each mismatch. This latter is a far more suspicious alignment IMO.

ADD REPLYlink written 3.7 years ago by Istvan Albert ♦♦ 79k

It is a very good point and I would not want to bias my analysis in an unfair way. Thanks for the advice. 

ADD REPLYlink written 3.7 years ago by james.lloyd80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 816 users visited in the last hour