How to improve existing transcriptome annotation using cufflink protocol ?
5
0
Entering edit mode
5.2 years ago
Chirag Parsania ★ 1.9k

I am working on fungi transcriptome. Existing annotation for my species is not reliable. To improve the annotation we generated RNAseq data with different stress conditions to make all transcripts to express in different conditions and using those expressed transcripts in different conditions I am trying to build one comprehensive transcriptome using cufflink protocol.

I ran cufflink and cuffmerge with and without reference in following combinations.

1) cufflink with reference cuffmerge with reference 2)cufflink with reference cuffmerge without reference 3)cufflink without reference cuffmerge with reference 4)cufflink without reference cuffmerge without reference

All of them has it's pros and cons. By doing some manual observation on IGV i decided to go with 4th combination which is cufflink without reference cuffmerge without reference. Still I am not satisfied the way it did annotations

For example consider following scenario (attached image). Though I have clearly 3 transcripts expressed in all my conditions (shown in the green box) none of the above combinations of cufflink and cuffmerge could detect the true transcript structure.

Can anyone tell why this situation ? Also suggest if any better option available than cufflink cuffmerge.

Assembly transcriptome cufflink cuffmerge • 2.2k views
2
Entering edit mode
5.2 years ago

Dear Chirag,

If there is no specific reason to use cufflink, I strongly recommend using string-tie. Check out the below link for a comparison of string-tie v/s cufflink. Though this comparison is for human RNA-seq data sets, I am sure string-tie will outperform for fungi data.

Regards Vijay

0
Entering edit mode

Hi Vijay,

Good to know this new transcriptome assembler (though published in 2014). Seems actively updated vs cufflink (last release 2014). I will try this and share the results.

Thanks a lot.

0
Entering edit mode

Chirag, you must also be interested in checking out the "new" tuxedo protocol. Old tuxedo protocol uses tophat, cufflink, cuffmerge and cummerRbund, while the new protocol is described in below image

Hisat2 is very efficient mapper in terms of memory and time. Sensitivity is also good.

0
Entering edit mode

Additionally, both the pipelines are coming from same team Steven Salzberg's lab.

0
Entering edit mode

Hi Vijay,

I have used stringtie. Results are more or less same with cufllink. I decided not to merge transcripts from different samples rather use individual assembly and filter them manually by user specified criteria. Merging assembly in fugal genome may not good idea because genes are very close to each other. One question I have here is why most of the transcripts orientation are wrong given by string tie? To check the strand I colored alignment by "first of paired strand" on IGV. Most of them having orientation regardless of alignment color. I wonder, how stringtie assigns strand information to generated transcripts ?

0
Entering edit mode

Hi Chirag,

There was indeed a bug reported in earlier version (5/18/2015 - v1.0.4 release), however, you should be using the latest version,hence, that could not be an issue now.

Is your library strand specific? According to string-tie paper (page#4):

"We considered multi-exon transcripts to be correctly assembled only if their strand was also correctly identified, and when strand-specific RNA- seq data were used, we also required that single-exon transcripts were assigned to the correct strand"

Did you use --fr and --rf options?

0
Entering edit mode

yes... I know these options. My data is strand specific (--rf) so I used string tie accordingly. I also tried another option (--fr) to confirm if my library option is correct. Still there is an issue.

EDIT : More precisely most of them are in correct orientation. But still there are cases which have opposite strand than what we see the reads color in IGV. Most of them I saw having lot's of antisense transcription going on. This could be one reason assembler is not able to assign correct strand

0
Entering edit mode
5.2 years ago

You might be able to improve annotation using dedicated software that can use transcriptome support as one of the evidence, plus add information about evolutionary related species genomes, genes, pram domains and so on. You can look at a variety of such tools likа maker, augustus, fgenesh, genemark and many other. You can combine predictions of these tools. Anyway you will need to filter your results extensively based on many different statistics and different data plus for genes of your main interest you better go dipper and assess results visually yourself as you did. I am not sure if you can rebuild reliably all gene models with their alternative splicing from RNA-seq data. Maybe it is possible at a high coverage but I do not remember numbers.

0
Entering edit mode
5.2 years ago

There are tools out there even better than stringtie. Take a look at figure 2 in this article: http://www.ncbi.nlm.nih.gov/pubmed/27760567.

Those might be more interesting.

0
Entering edit mode
4.5 years ago
h.botond ▴ 50

Dear Chirag,

I am struggling with similar problems. I like to improve my fungi genome annotation, especially to the UTR with my RNAseq data. Can you tell me your experiences with this problem? I have tried the classic and new tuxedo workflows and the trinity as well but neither of them could detect the true transcript structure or only partially.

Thanks for any suggestions.

0
Entering edit mode
4.5 years ago
colindaven ★ 4.0k

Fungi have quite tightly packed genomes. You could try this specialist option,snowyowl. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-229