Question

tophat: importance of mate-inner-distance parameter

1

Entering edit mode

9.0 years ago

tospo ▴ 50

I am confronted with the following problem: at work, a large number of RNAseq libraries have been analysed by an automated pipeline. The pipeline runs tophat to do the mapping. Unfortunately, the mate-inner-distance parameter has been set inconsistently between samples and varies significantly.

My question now is, just how much influence does this parameter really have? I know what it is and why tophat wants it but my guess would be that it really only comes into play if there are multiple high-scoring alignments and the fragment size can help to find the correct one. Is that the case or do I have to expect lots of perfectly good alignments being discarded if this parameter is set incorrectly? For example, in some cases the parameter is set to 400 when it should really be closer to zero.

The decision I have to make is whether I need to write my own pipeline or use the data that is available. Any help would be much appreciated. Thanks!

tophat RNA-seq • 2.9k views

ADD COMMENT • link updated 15 months ago by Ram 43k • written 9.0 years ago by tospo ▴ 50

0

Entering edit mode

The question is, how willing are you to get incorrect alignments? Like splicing sites linking genes and pseudo-genes or repeated elements (and supposing they are close on the same chromosome)? If you are doing annotation, it might get annoying, for quantification/differential analysis it will be utterly negligible because incorrect reads are in minority.

I don't think it may be a problem even for annotation, except for very short introns or deletion/insertion of a few hundred bases if they happen). Usually either a read has its mate distant roughly less than 1000bp, or there are has been some genetic event like splicing.

Also, what is the nature of your data? Are you looking at cancer samples which may act a bit wilder than usual cells?

ADD REPLY • link updated 15 months ago by Ram 43k • written 9.0 years ago by cyril-cros ▴ 950

0

Entering edit mode

Just a couple of comments, not an answer... BBMap, which also does RNA-seq mapping, will automatically calibrate the mate-pair distance on the fly so you don't need to supply that parameter. It's also packaged with BBMerge, which, if your reads overlap, can tell you the insert size mean, mode, median, etc in just a couple seconds, so you can plug that in to Tophat if you want.

ADD REPLY • link updated 15 months ago by Ram 43k • written 9.0 years ago by Brian Bushnell 20k