Entering edit mode
5.1 years ago
XBria ▴ 90
I am trying to change STAR parameters to decrease mismatch rate.
the parameters are going to be changed are:
--outFilterMismatchNmax --outFilterMismatchNoverLmax --outFilterMismatchNoverReadLmax
The one is going to be changed while the rest are the same as default. Then if I find the best value for the first one, I enter the first parameter with the best value and then change the second one and so on. Am I right ?
Then I will visualize the results. Which software do you recommend to use ?
With those three you will need to try all combinations (which may keep you busy for a while). Unless this is a new dataset you had over 95% unique alignments in a last dataset.
For reference past threads:
Do I have to change all three at each mapping ?
Why do you think this will give you a better alignment?
The results show higher rate of mismatches. I have to adjust parameters to decrease the rate.
Don't you expect mismatches due to naturally occurring SNPs? After all, an aligner will search for the best possible match of your read to the reference. A few mismatches aren't something to worry about I'd say.
But that will likely affect overall alignment percentage. Are you ok with that? There is always going to be a trade-off between accuracy/sensitivity/precision.
Dear Genomax, Can you please clarify how may I do that ? what I am going to do is mapping with the parameter --outFilterMismatchNmax with different values. the results will be written in a table. As I understood, the smaller value may not be the best option. There should be a cut off considering sensitivity and precision. Do the mismatch value and overal alignment rate determine the cut off ? Please let me know.
The trade-off that @genomax is talking about is a trade off between correct mapping location (precision and accuracy) and ability to find a good mapping location at all (sensitivity). the more mismatches you allow, the more spurious the alignments will become, but at the same time the more reads you'll be able to map. The stricter you get with mismatches, the less reads you'll map, but higher the chance that you're mapping them in the correct place. The trade off is not at a certain point you can calculate, i'd say it's more of a mixture between experience, common sense and voodoo (you do the parts). If you manage to map only 10% of you're reads and they come from the same organism of the reference, you're too strict. If you map banana mRNA reads on the human transcriptome and you have a 98% mapping rate, you are too relaxed with the paremeters. A "quick" way to go around this problem? estimate how many mismatches you expect on a sequence of the length of your read, by looking at the evolutionary divergence between species and their generation times (f.e. human generation time 25 years, plants ~1 year). If that's too far for what you need, then it really comes down to common sense.
Many many thanks Macspider,
Your answers are always easy-to-understand. I am trying to do what you said.
Again thanks :)
Since outfiltermismatchNoverLmax and outfiltermismatchNoverReadLmax do not improve the mismatch rate, I will leave them intact. The only parameter I change is outfiltermismatchNmax. I would like to know your idea please.