Question: tophat: importance of mate-inner-distance parameter
1
gravatar for tospo
3.3 years ago by
tospo30
United Kingdom
tospo30 wrote:

I am confronted with the following problem: at work, a large number of RNAseq libraries have been analysed by an automated pipeline. The pipeline runs tophat to do the mapping. Unfortunately, the mate-inner-distance parameter has been set inconsistently between samples and varies significantly.

My question now is, just how much influence does this parameter really have? I know what it is and why tophat wants it but my guess would be that it really only comes into play if there are multiple high-scoring alignments and the fragment size can help to find the correct one. Is that the case or do I have to expect lots of perfectly good alignments being discarded if this parameter is set incorrectly? For example, in some cases the parameter is set to 400 when it should really be closer to zero.

The decision I have to make is whether I need to write my own pipeline or use the data that is available. Any help would be much appreciated. Thanks!

tophat rnaseq • 1.8k views
ADD COMMENTlink written 3.3 years ago by tospo30

The question is, how willing are you to get incorrect alignments? Like splicing sites linking genes and pseudo-genes or repeated elements (and supposing they are close on the same chromosome)? If you are doing annotation, it might get annoying, for quantification/differential analysis it will be utterly negligible because incorrect reads are in minority.

I don't think it may be a problem even for annotation, except for very short introns or deletion/insertion of a few hundred bases if they happen). Usually either a read has its mate distant roughly less than 1000bp, or there are has been some genetic event like splicing. 

Also, what is the nature of your data? Are you looking at cancer samples which may act a bit wilder than usual cells?

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by cyril-cros820

Just a couple comments, not an answer...  BBMap, which also does RNA-seq mapping, will automatically calibrate the mate-pair distance on the fly so you don't need to supply that parameter.  It's also packaged with BBMerge, which, if your reads overlap, can tell you the insert size mean, mode, median, etc in just a couple seconds, so you can plug that in to Tophat if you want.

ADD REPLYlink written 3.3 years ago by Brian Bushnell15k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1475 users visited in the last hour