I tried running tophat2 with default parameters and got a number of junctions that were way too long for my reference genome, which is Arabidopsis. This image illustrates this - notice the features with tophat-style junction names.
Note there do not seem to be a lot of these extra-big introns, but they are very noticeable in a genome browser!
- Have other people seen this and if yes, how do you handle it?
- Is there a good maximum intron size that allows tophat2 to find the real introns but keeps it from finding these clearly wrong ones?
- And, could the "wrong" introns be biologically interesting?