I am confronted with the following problem: at work, a large number of RNAseq libraries have been analysed by an automated pipeline. The pipeline runs tophat to do the mapping. Unfortunately, the mate-inner-distance parameter has been set inconsistently between samples and varies significantly.
My question now is, just how much influence does this parameter really have? I know what it is and why tophat wants it but my guess would be that it really only comes into play if there are multiple high-scoring alignments and the fragment size can help to find the correct one. Is that the case or do I have to expect lots of perfectly good alignments being discarded if this parameter is set incorrectly? For example, in some cases the parameter is set to 400 when it should really be closer to zero.
The decision I have to make is whether I need to write my own pipeline or use the data that is available. Any help would be much appreciated. Thanks!