Question

Why does cuffdiff test differential isoform expression for genes with only one annotated transcript?

0

Entering edit mode

8.8 years ago

unsupervised_learner ▴ 30

Experimental background: Mice underwent three drug treatment conditions, with three biological replicates per condition. 100 bp paired-end RNA-seq was carried out on a specific brain region and cell type.

I used STAR to map with recommended settings for usage of the output with cufflinks. I ultimately ran cuffdiff on the .sam files, filtering non-coding RNA's using an ensembl .gtf to mask them, which returned results for differential gene and isoforms testing (genes_exp.diff, isoform_exp.diff). I quantified the number of annotated transcripts per gene using the isoforms fpkm_tracking file and looked to see how many of the genes with differentially expressed isoforms only had one annotated transcript that could have been tested. Surprisingly, a substantial number of significantly (q_value < 0.05) differentially expressed isoforms only had one annotated transcript, 100% of which were also identified as differentially expressed at the gene level.

Thus, my question is why cuffdiff would test transcripts originating from genes that only have one annotated transcript when this would almost surely come back as differentially expressed at the gene level, as my results would indicate? And has anyone else noticed this? My understanding of "differential isoform expression" or "alternative isoform" usage is based on the idea that there should be potential alternative isoforms for any given gene, otherwise there is no "alternative" and it simply becomes differential gene expression. I could not find anything related to this phenomenon in the cuffdiff documentation (http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/index.html#differential-expression-tests).

My inclination is to exclude all isoforms originating from genes with only one annotated isoform that could possibly be tested, then generate new corrected p-values (i.e. q-values) based on the exclusion of these genes. I would appreciate any input on this. Thanks!

RNA-Seq Cuffdiff • 3.0k views

ADD COMMENT • link updated 8.8 years ago by Satyajeet Khare ★ 1.6k • written 8.8 years ago by unsupervised_learner ▴ 30

score 0 · Answer 1 · 2016-09-08

0

Entering edit mode

8.8 years ago

Satyajeet Khare ★ 1.6k

I guess isoform_exp.diff is a misnomer. transcript_exp.diff would be more accurate. isoform_exp.diff file can be handy if one wants to study overexpressed vs underexpressed transcripts for their features, irrespective of whether they come from same gene or not.

In current scenario, if you are interested in studying expression of genes with specific number of isoforms, isoform_exp.diff can be used with some count function.

Generally, study of gene that express multiple transcripts is associated with alternative RNA processing, for which splicing.diff and promoter.diff files are also generated.

ADD COMMENT • link 8.8 years ago by Satyajeet Khare ★ 1.6k

0

Entering edit mode

Thank you for your reply. In this scenario I am interested in identifying isoform switching as a drug response, which means having greater than 1 known isoforms. I did in fact use python to categorize genes by number of annotated transcripts and am continuing this analysis knowing that this is a feature of cuffdiff. Thanks, again.

ADD REPLY • link 8.8 years ago by unsupervised_learner ▴ 30