So apparently, I have a problem with running tuxedo protocol.
Now I have run tophat on my data with novel junctions function enabled (as default) as well as supplying reference annotation gtf file as
-G. My data is paired end run RNA-seq. Then I have run cufflinks transcriptome assembly sample by sample on each
accepted_hits.bam file, also with supplying cufflinks with the same reference annotation gtf file as
Further , I merged all the transcripts from my samples with supplying cuffmerge with the reference annotation one more time as
So at the end I have added it 3 times, in tophat, cufflinks,cuffmerge along the protocol.
However, the output of cuffmerge is not as expected.
I understand that it did repeat some exons shared among different isoforms either assembled by cufflinks or it was available in reference annotation.
But I don't understand two things:
- Why did cuffmerge annotates all exons as their source is " cufflinks" ?! however some exons are surely form the reference annotation?
- Is there a ways to remove the reference exons and only keep novel exons!! or at least remove repeated old exons so we can keep only one copy of the annotated and novel ones?
Did I make a mistake supplying the reference annotation too many times in the protocol? should I really modify one of those steps to get what I want i.e list of annotated exons and novel exons with no repetition in one file, with the novel exons identified as novel exons?
Thanks so much