Question: How to improve existing transcriptome annotation using cufflink protocol ?
gravatar for Chirag Parsania
18 months ago by
Chirag Parsania1.2k
University of Macau
Chirag Parsania1.2k wrote:

I am working on fungi transcriptome. Existing annotation for my species is not reliable. To improve the annotation we generated RNAseq data with different stress conditions to make all transcripts to express in different conditions and using those expressed transcripts in different conditions I am trying to build one comprehensive transcriptome using cufflink protocol.

I ran cufflink and cuffmerge with and without reference in following combinations.

1) cufflink with reference cuffmerge with reference 2)cufflink with reference cuffmerge without reference 3)cufflink without reference cuffmerge with reference 4)cufflink without reference cuffmerge without reference

All of them has it's pros and cons. By doing some manual observation on IGV i decided to go with 4th combination which is cufflink without reference cuffmerge without reference. Still I am not satisfied the way it did annotations

For example consider following scenario (attached image). Though I have clearly 3 transcripts expressed in all my conditions (shown in the green box) none of the above combinations of cufflink and cuffmerge could detect the true transcript structure.

Can anyone tell why this situation ? Also suggest if any better option available than cufflink cuffmerge.

ADD COMMENTlink modified 9 months ago by colindaven840 • written 18 months ago by Chirag Parsania1.2k
gravatar for Vijay Lakhujani
18 months ago by
Vijay Lakhujani3.4k
Vijay Lakhujani3.4k wrote:

Dear Chirag,

If there is no specific reason to use cufflink, I strongly recommend using string-tie. Check out the below link for a comparison of string-tie v/s cufflink. Though this comparison is for human RNA-seq data sets, I am sure string-tie will outperform for fungi data.

Here is the link:

Look under the heading "Comparisons to Cufflinks".

Please do share your findings. Would love to hear back!

Regards Vijay

ADD COMMENTlink written 18 months ago by Vijay Lakhujani3.4k

Hi Vijay,

Good to know this new transcriptome assembler (though published in 2014). Seems actively updated vs cufflink (last release 2014). I will try this and share the results.

Thanks a lot.

ADD REPLYlink written 18 months ago by Chirag Parsania1.2k

Chirag, you must also be interested in checking out the "new" tuxedo protocol. Old tuxedo protocol uses tophat, cufflink, cuffmerge and cummerRbund, while the new protocol is described in below image New tuxedo protocol

Here is the paper link.

Hisat2 is very efficient mapper in terms of memory and time. Sensitivity is also good.

ADD REPLYlink written 18 months ago by Vijay Lakhujani3.4k

Additionally, both the pipelines are coming from same team Steven Salzberg's lab.

ADD REPLYlink written 18 months ago by Vijay Lakhujani3.4k

Hi Vijay,

I have used stringtie. Results are more or less same with cufllink. I decided not to merge transcripts from different samples rather use individual assembly and filter them manually by user specified criteria. Merging assembly in fugal genome may not good idea because genes are very close to each other. One question I have here is why most of the transcripts orientation are wrong given by string tie? To check the strand I colored alignment by "first of paired strand" on IGV. Most of them having orientation regardless of alignment color. I wonder, how stringtie assigns strand information to generated transcripts ?

ADD REPLYlink written 18 months ago by Chirag Parsania1.2k

Hi Chirag,

There was indeed a bug reported in earlier version (5/18/2015 - v1.0.4 release), however, you should be using the latest version,hence, that could not be an issue now.

Is your library strand specific? According to string-tie paper (page#4):

"We considered multi-exon transcripts to be correctly assembled only if their strand was also correctly identified, and when strand-specific RNA- seq data were used, we also required that single-exon transcripts were assigned to the correct strand"

Did you use --fr and --rf options?

ADD REPLYlink written 18 months ago by Vijay Lakhujani3.4k

yes... I know these options. My data is strand specific (--rf) so I used string tie accordingly. I also tried another option (--fr) to confirm if my library option is correct. Still there is an issue.

EDIT : More precisely most of them are in correct orientation. But still there are cases which have opposite strand than what we see the reads color in IGV. Most of them I saw having lot's of antisense transcription going on. This could be one reason assembler is not able to assign correct strand

ADD REPLYlink modified 18 months ago • written 18 months ago by Chirag Parsania1.2k
gravatar for Petr Ponomarenko
18 months ago by
United States / Los Angeles /
Petr Ponomarenko2.5k wrote:

You might be able to improve annotation using dedicated software that can use transcriptome support as one of the evidence, plus add information about evolutionary related species genomes, genes, pram domains and so on. You can look at a variety of such tools likа maker, augustus, fgenesh, genemark and many other. You can combine predictions of these tools. Anyway you will need to filter your results extensively based on many different statistics and different data plus for genes of your main interest you better go dipper and assess results visually yourself as you did. I am not sure if you can rebuild reliably all gene models with their alternative splicing from RNA-seq data. Maybe it is possible at a high coverage but I do not remember numbers.

ADD COMMENTlink written 18 months ago by Petr Ponomarenko2.5k
gravatar for kristoffer.vittingseerup
18 months ago by
European Union
kristoffer.vittingseerup1.0k wrote:

There are tools out there even better than stringtie. Take a look at figure 2 in this article:

Those might be more interesting.

ADD COMMENTlink written 18 months ago by kristoffer.vittingseerup1.0k
gravatar for h.botond
9 months ago by
h.botond40 wrote:

Dear Chirag,

I am struggling with similar problems. I like to improve my fungi genome annotation, especially to the UTR with my RNAseq data. Can you tell me your experiences with this problem? I have tried the classic and new tuxedo workflows and the trinity as well but neither of them could detect the true transcript structure or only partially.

Thanks for any suggestions.

ADD COMMENTlink written 9 months ago by h.botond40
gravatar for colindaven
9 months ago by
Hannover Medical School
colindaven840 wrote:

Fungi have quite tightly packed genomes. You could try this specialist option,snowyowl.

ADD COMMENTlink written 9 months ago by colindaven840
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1196 users visited in the last hour