Question

Summarizing transcript level annotation from Trinity de novo assembly to gene level annotation

2

Entering edit mode

22 months ago

MB ▴ 60

Dear community,

I have a general question about differential expression analysis when one is working with non-model organisms. When genomic resources are absent, one of the first RNAseq analysis steps is the generation of the de novo transcriptome, which is used afterwards as reference for mapping/abundance estimation of the assembled transcripts. When these abundances are fed into differential expression analysis tools (e.g. DESEq2), it is recommended to use gene level estimates instead of using transcript/isoform abundances. When I am working with RNAseq data, I normally use: Trinity -> RSEM -> tximport/DESeq2, whereby I use the Trinity assumption about gene-isoform relation for summarizing abundances of gene level with tximport. This is of course not perfect, but it is a starting point. The problem comes, when you want to do functional enrichment: the transcriptome will be annotated on transcript level, but the differential gene expression is not on transcript level anymore, but for functional enrichment (e.g. GO term enrichment), you need annotation data and information about what is differentially expressed. For isoforms that are annotated to the same gene product (as in most cases), this is not problematic. But how to deal with isoforms from the same (Trinity) 'gene' which show different annotations (and would therefore get different GO terms)?

Going back to transcript level expression analysis (to avoid the annotation problem)?
Doing the annotation not on transcript level but select one representative isoform? If yes, how? Clustering, select the longest or most expressed isoform or other kinds of representatives?
Summarizing all isoform annotations (and therefore all GO terms) for one gene?

Are there any experiences/thoughts/recommendations about that? A link to overseen posts/literature would also be highly appreciated! Many thanks in advance, looking forward to your opinions/thoughts about that.

functional expression differential RNAseq transcriptome enrichment Trinity annotation genes • 906 views

ADD COMMENT • link 15 months ago by MB ▴ 60

0

Entering edit mode

Hello, I am wondering the exact same thing. I have been mulling over very similar options and sort of taking every approach, but I would love to hear from someone experienced with this situation.

ADD REPLY • link 21 months ago by cdsparks • 0

0

Entering edit mode

MB did you get any resolution elsewhere in how to approach?

ADD REPLY • link 19 months ago by fiddlemethis • 0

0

Entering edit mode

Hey, sorry for the late response. Unfortunately not really. I prefer to work with Trinity 'genes' instead of transcripts because I know there is a lot of contig-oversplitting in the assemblies, which will affect the abundance estimation of a transcript. Thus, using gene level counts is IMO a more flexibel way of 'clustering' than traditional clustering on a fixed threshold.

I found that in most cases, the annotations are the same for all isoforms. Since it could be biologically valid that one gene has different gene products, I'll do now a summarising approach. But I guess it would also fine to exclude ambiguous annotations. In most data sets, ambiguously annotated genes will not have so much impact that they really change the outcome of a functional enrichment analysis. If they do, that means you cannot trust your annotation/enrichment data anyway. Since for most non-model organisms, annotation data is too scarce to make in-depth analysis anyway, I would recommend trying to make the DE analysis as robust as possible and using annotation/functional enrichment just as first insight in possibly regulated pathways.

All the best.

ADD REPLY • link 15 months ago by MB ▴ 60