Question: How to improve existing transcriptome annotation using cufflink protocol ?
9 months ago by
University of Macau
Chirag Parsania430 wrote:

I am working on fungi transcriptome. Existing annotation for my species is not reliable. To improve the annotation we generated RNAseq data with different stress conditions to make all transcripts to express in different conditions and using those expressed transcripts in different conditions I am trying to build one comprehensive transcriptome using cufflink protocol.

I ran cufflink and cuffmerge with and without reference in following combinations.

1) cufflink with reference cuffmerge with reference 2)cufflink with reference cuffmerge without reference 3)cufflink without reference cuffmerge with reference 4)cufflink without reference cuffmerge without reference

All of them has it's pros and cons. By doing some manual observation on IGV i decided to go with 4th combination which is cufflink without reference cuffmerge without reference. Still I am not satisfied the way it did annotations

For example consider following scenario (attached image). Though I have clearly 3 transcripts expressed in all my conditions (shown in the green box) none of the above combinations of cufflink and cuffmerge could detect the true transcript structure.

Can anyone tell why this situation ? Also suggest if any better option available than cufflink cuffmerge.

modified 19 days ago by colindaven550 • written 9 months ago by Chirag Parsania430
9 months ago by
Vijay Lakhujani1.8k
Vijay Lakhujani1.8k wrote:

Dear Chirag,

If there is no specific reason to use cufflink, I strongly recommend using string-tie. Check out the below link for a comparison of string-tie v/s cufflink. Though this comparison is for human RNA-seq data sets, I am sure string-tie will outperform for fungi data.

Here is the link:

Look under the heading "Comparisons to Cufflinks".

Please do share your findings. Would love to hear back!

Regards Vijay

written 9 months ago by Vijay Lakhujani1.8k

Hi Vijay,

Good to know this new transcriptome assembler (though published in 2014). Seems actively updated vs cufflink (last release 2014). I will try this and share the results.

Thanks a lot.

written 9 months ago by Chirag Parsania430

Chirag, you must also be interested in checking out the "new" tuxedo protocol. Old tuxedo protocol uses tophat, cufflink, cuffmerge and cummerRbund, while the new protocol is described in below image New tuxedo protocol

Here is the paper link.

Hisat2 is very efficient mapper in terms of memory and time. Sensitivity is also good.

written 9 months ago by Vijay Lakhujani1.8k

Additionally, both the pipelines are coming from same team Steven Salzberg's lab.

written 9 months ago by Vijay Lakhujani1.8k

Hi Vijay,

I have used stringtie. Results are more or less same with cufllink. I decided not to merge transcripts from different samples rather use individual assembly and filter them manually by user specified criteria. Merging assembly in fugal genome may not good idea because genes are very close to each other. One question I have here is why most of the transcripts orientation are wrong given by string tie? To check the strand I colored alignment by "first of paired strand" on IGV. Most of them having orientation regardless of alignment color. I wonder, how stringtie assigns strand information to generated transcripts ?

written 9 months ago by Chirag Parsania430

Hi Chirag,

There was indeed a bug reported in earlier version (5/18/2015 - v1.0.4 release), however, you should be using the latest version,hence, that could not be an issue now.

Is your library strand specific? According to string-tie paper (page#4):

"We considered multi-exon transcripts to be correctly assembled only if their strand was also correctly identified, and when strand-specific RNA- seq data were used, we also required that single-exon transcripts were assigned to the correct strand"

Did you use --fr and --rf options?

written 9 months ago by Vijay Lakhujani1.8k

yes... I know these options. My data is strand specific (--rf) so I used string tie accordingly. I also tried another option (--fr) to confirm if my library option is correct. Still there is an issue.

EDIT : More precisely most of them are in correct orientation. But still there are cases which have opposite strand than what we see the reads color in IGV. Most of them I saw having lot's of antisense transcription going on. This could be one reason assembler is not able to assign correct strand

modified 9 months ago • written 9 months ago by Chirag Parsania430
9 months ago by
United States / Los Angeles /
Petr Ponomarenko2.5k wrote:

You might be able to improve annotation using dedicated software that can use transcriptome support as one of the evidence, plus add information about evolutionary related species genomes, genes, pram domains and so on. You can look at a variety of such tools likа maker, augustus, fgenesh, genemark and many other. You can combine predictions of these tools. Anyway you will need to filter your results extensively based on many different statistics and different data plus for genes of your main interest you better go dipper and assess results visually yourself as you did. I am not sure if you can rebuild reliably all gene models with their alternative splicing from RNA-seq data. Maybe it is possible at a high coverage but I do not remember numbers.

written 9 months ago by Petr Ponomarenko2.5k
9 months ago by
European Union
kristoffer.vittingseerup430 wrote:

There are tools out there even better than stringtie. Take a look at figure 2 in this article:

Those might be more interesting.

written 9 months ago by kristoffer.vittingseerup430
22 days ago by
h.botond40 wrote:

Dear Chirag,

I am struggling with similar problems. I like to improve my fungi genome annotation, especially to the UTR with my RNAseq data. Can you tell me your experiences with this problem? I have tried the classic and new tuxedo workflows and the trinity as well but neither of them could detect the true transcript structure or only partially.

Thanks for any suggestions.

written 22 days ago by h.botond40
19 days ago by
colindaven550 wrote:

Fungi have quite tightly packed genomes. You could try this specialist option,snowyowl.

written 19 days ago by colindaven550
