I am working on fungi transcriptome. Existing annotation for my species is not reliable. To improve the annotation we generated RNAseq data with different stress conditions to make all transcripts to express in different conditions and using those expressed transcripts in different conditions I am trying to build one comprehensive transcriptome using cufflink protocol.
I ran cufflink and cuffmerge with and without reference in following combinations.
1) cufflink with reference cuffmerge with reference 2)cufflink with reference cuffmerge without reference 3)cufflink without reference cuffmerge with reference 4)cufflink without reference cuffmerge without reference
All of them has it's pros and cons. By doing some manual observation on IGV i decided to go with 4th combination which is cufflink without reference cuffmerge without reference. Still I am not satisfied the way it did annotations
For example consider following scenario (attached image). Though I have clearly 3 transcripts expressed in all my conditions (shown in the green box) none of the above combinations of cufflink and cuffmerge could detect the true transcript structure.
Can anyone tell why this situation ? Also suggest if any better option available than cufflink cuffmerge.
Hi Vijay,
Good to know this new transcriptome assembler (though published in 2014). Seems actively updated vs cufflink (last release 2014). I will try this and share the results.
Thanks a lot.
Chirag, you must also be interested in checking out the "new" tuxedo protocol. Old tuxedo protocol uses tophat, cufflink, cuffmerge and cummerRbund, while the new protocol is described in below image
Here is the paper link.
Hisat2 is very efficient mapper in terms of memory and time. Sensitivity is also good.
Additionally, both the pipelines are coming from same team Steven Salzberg's lab.
Hi Vijay,
I have used stringtie. Results are more or less same with cufllink. I decided not to merge transcripts from different samples rather use individual assembly and filter them manually by user specified criteria. Merging assembly in fugal genome may not good idea because genes are very close to each other. One question I have here is why most of the transcripts orientation are wrong given by string tie? To check the strand I colored alignment by "first of paired strand" on IGV. Most of them having orientation regardless of alignment color. I wonder, how stringtie assigns strand information to generated transcripts ?
Hi Chirag,
There was indeed a bug reported in earlier version (5/18/2015 - v1.0.4 release), however, you should be using the latest version,hence, that could not be an issue now.
Is your library strand specific? According to string-tie paper (page#4):
"We considered multi-exon transcripts to be correctly assembled only if their strand was also correctly identified, and when strand-specific RNA- seq data were used, we also required that single-exon transcripts were assigned to the correct strand"
Did you use
--fr
and--rf
options?yes... I know these options. My data is strand specific (
--rf
) so I used string tie accordingly. I also tried another option (--fr
) to confirm if my library option is correct. Still there is an issue.EDIT : More precisely most of them are in correct orientation. But still there are cases which have opposite strand than what we see the reads color in IGV. Most of them I saw having lot's of antisense transcription going on. This could be one reason assembler is not able to assign correct strand