When using the TopHat/Cufflink pipeline, I noticed some gene models contained non-canonical intron boundaries. I used a reference-guided approach, but I don't think that matters... My question is whether I can forbid either Cufflink or Tophat to look for non-canonical junctions? I want TopHat/Cufflink to find new transcripts/gene models, but don't want the pipeline to find non-conventional junctions.
ADD COMMENT
• link
updated 3.0 years ago by
Ram
44k
•
written 10.3 years ago by
Wchang
▴
60
2
Entering edit mode
You can use RNA STAR for mapping and while writing out the command you can say --outFilterIntronMotifs RemoveNoncanonicalUnannotated. This should take care of your problem.
TopHat is not able to map to novel non-canoncal introns (without GT-AG, GC-AG or AT-AC). The reason why you are detecting non-canonical introns must be due to the presence of non-canonical introns the transcriptome annotation that you are using. So there are 2 easy solutions to solve this:
Pre-filter the transcriptome annotation file in order to exclude all the transcripts that have non-canonical introns.
Do not use the reference-guied approach and lets TopHat do a ab intio detection of splice junctions.
Now, let me tell you that we have done a comprehensive analysis of non-canonical splice sites in the human transcriptome. You probably want to exclude non-canonical introns because you think that they are source of alignment artifacts. In fact the most of the non-canonical introns that we initially detect are artifacts, but through a set of a very stringent filters we build a high confident catalog of non-canonical introns present in the human transcriptome, we found interesting features associated to them and we even validate some of these by RT-PCR. If you're going to do a genome-wide analysis yo can ignore them, but if you want to properly annotate a transcriptome you should consider them. Our work is under revision, but if anybody wants to know more, can email us.
You can use RNA STAR for mapping and while writing out the command you can say --outFilterIntronMotifs RemoveNoncanonicalUnannotated. This should take care of your problem.