Isoform level differential expression
3.2 years ago
Hi all,

Apologies for double-posting or for perhaps any mistakes in my knowledge but I have been reading around the forums and papers regarding differential expression at the isoform level and am beginning to be quite confused. I have followed the below tutorial for reference guided transcript assembly:

https://github.com/griffithlab/rnaseq_tutorial/wiki/Differential-Expression

In short, I wanted to construct a more comprehensive transcriptome using RNA seq from samples treated with and without stimuli. I performed alignments using STAR, then StringTie to assemble transcripts (merged the gtfs for all samples) then re-ran Stringtie using the reference guided merged GTF. These files have FPKM and TPM data. I then used Ballgown for DE which can only accept the FPKM or 'cov' arguments.

My main conclusions from reading around are that there are many flaws with using FPKM for DE, and that TPM is the preferred route. However Ballgown does not currently support this as mentioned above. AIso that count based methods are not ideal for isoform level analyses (and are preferred for gene level instead) though I am also not entirely clear on this and should anyone have any clear links to explanations, I would greatly appreciate the direction. However, now I am struggling to decide how best to proceed in order to get the final differential expression for each transcript within my treatment vs control samples. DESeq2 for example is count based and apparently not appropriate? Could anyone advise or know packages that work well with the output of StringTie?

Many thanks to all and I greatly appreciate your help.

RNA-Seq StringTie Ballgown • 1.4k views
I would start by reading this paper/workflow from Michael Love (developer/maintainer of tximport and DESeq2) towards differential analysis of transcript usage. Without being an expert on the matter, I would guess that it might make more sense to focus on the annotated isoforms/transcripts in the (I guess you work on human) genome/transcriptome rather than assembling the transcriptome yourself. Probably the isoforms you are interested in are already annotated (as often the genes one is interested in are not mystical new forms of transcripts but simply what is already there). References from GENCODE etc. are well-curated so I doubt you will gain much by assembling yourself beyond accumulating non-validated and potentially false-positives transcripts.

Thanks very much for your reply! Indeed, as I am working on hg38, the genome is already very well annotated and while not much may be gained, my project involves in fact trying to assemble de novo (non coding) transcripts so I may still carry on but I really appreciate your suggestions.