Question: Why does cuffdiff test differential isoform expression for genes with only one annotated transcript?
0
gravatar for unsupervised_learner
2.5 years ago by
United States
unsupervised_learner10 wrote:

Experimental background: Mice underwent three drug treatment conditions, with three biological replicates per condition. 100 bp paired-end RNA-seq was carried out on a specific brain region and cell type.

I used STAR to map with recommended settings for usage of the output with cufflinks. I ultimately ran cuffdiff on the .sam files, filtering non-coding RNA's using an ensembl .gtf to mask them, which returned results for differential gene and isoforms testing (genes_exp.diff, isoform_exp.diff). I quantified the number of annotated transcripts per gene using the isoforms fpkm_tracking file and looked to see how many of the genes with differentially expressed isoforms only had one annotated transcript that could have been tested. Surprisingly, a substantial number of significantly (q_value < 0.05) differentially expressed isoforms only had one annotated transcript, 100% of which were also identified as differentially expressed at the gene level.

Thus, my question is why cuffdiff would test transcripts originating from genes that only have one annotated transcript when this would almost surely come back as differentially expressed at the gene level, as my results would indicate? And has anyone else noticed this? My understanding of "differential isoform expression" or "alternative isoform" usage is based on the idea that there should be potential alternative isoforms for any given gene, otherwise there is no "alternative" and it simply becomes differential gene expression. I could not find anything related to this phenomenon in the cuffdiff documentation (http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/index.html#differential-expression-tests).

My inclination is to exclude all isoforms originating from genes with only one annotated isoform that could possibly be tested, then generate new corrected p-values (i.e. q-values) based on the exclusion of these genes. I would appreciate any input on this. Thanks!

rna-seq cuffdiff • 1.2k views
ADD COMMENTlink modified 2.5 years ago by Satyajeet Khare1.3k • written 2.5 years ago by unsupervised_learner10
0
gravatar for Satyajeet Khare
2.5 years ago by
Satyajeet Khare1.3k
Pune, India
Satyajeet Khare1.3k wrote:

I guess isoform_exp.diff is a misnomer. transcript_exp.diff would be more accurate. isoform_exp.diff file can be handy if one wants to study overexpressed vs underexpressed transcripts for their features, irrespective of whether they come from same gene or not.

In current scenario, if you are interested in studying expression of genes with specific number of isoforms, isoform_exp.diff can be used with some count function.

Generally, study of gene that express multiple transcripts is associated with alternative RNA processing, for which splicing.diff and promoter.diff files are also generated.

ADD COMMENTlink written 2.5 years ago by Satyajeet Khare1.3k

Thank you for your reply. In this scenario I am interested in identifying isoform switching as a drug response, which means having greater than 1 known isoforms. I did in fact use python to categorize genes by number of annotated transcripts and am continuing this analysis knowing that this is a feature of cuffdiff. Thanks, again.

ADD REPLYlink written 2.5 years ago by unsupervised_learner10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1384 users visited in the last hour