How to select the best isoform for a differential expressed gene in Trinity?
2
3
Entering edit mode
2.1 years ago

Hi,

I have followed Trinity guidelines and assembled a denovo assembly as well as used that assembly as a reference to quantify the reads using RSEM and finally followed it with Differential expression analysis using DESeq2. For Differential expression analysis, I have used RSEM.genes.results rather than the RSEM.isoform.results, since I was not sure using isoform level expression, is accurate or not compared to gene-level expression.

But the problem now is how do I select the best transcript/isoform for the Differential expressed genes because without that I cannot extract the sequence from Trinity assembly as the Trinity assembly has sequences for isoforms and not genes.

I have thought of doing this in several ways - selecting the longest isoform, clustering all the isoforms, and then select the longest isoform but I was wondering if I can get a consensus of all the isoforms for the gene of interest.

Trinity DESeq2 • 1.4k views
4
Entering edit mode
2.1 years ago
h.mon 34k

Using isoforms instead of genes is not incorrect per se, but it is noisier (particularly so for a de novo assembled transcriptome) and needs larger sample sizes and deeper sequencing per sample, so as ATpoint already said, I would indeed recommend gene-level expression analysis.

There is no need to use tximport, as its method has been implemented in Trinity. The RSEM.genes.results should be identical to importing the counts with tximport.

What is the "best" isoform is open to debate, but I would argue the longest is not the best. I remember seeing Trinity authors recommending selecting the most expressed isoform, but I can't find the link.

but I was wondering if I can get a consensus of all the isoforms for the gene of interest.

It seems you want a "super-transcripts" representation of the transcripts, Trinity has implemented this as well, see the SuperTranscripts wiki page.

0
Entering edit mode

Thank you so much. You saved my day!!! Super-transcripts are what I was looking for :)

2
Entering edit mode
2.1 years ago
ATpoint 62k

You should aggregate the transcript level abundance estimates to the gene level, e.g. with the tximport tool from Bioconductor. Gene level differential analysis is much more robust than differential transcript analysis, and in fact DESeq2 is not intended for the latter. There is no "best" isoform. tximport will summarize the transcript levels to a single gene level counts which are then being analyzed with DESeq2.

0
Entering edit mode

Thanks for your comment. I did that already. But the question is once I have the list of differential expressed genes (not isoforms), how do I go back and extract the sequence-specific for the differential expressed gene from the Trinity assembly? The Trinity assembly consists of isoforms and not genes.