Question

Summary of alternative splicing from Cufflinks

9

Entering edit mode

9.6 years ago

Daniel Standage 4.1k

tl;dr How can I get summary information about alternative splicing from a Cufflinks analysis?

A typical RNA-seq workflow using the Tuxedo suite involves mapping reads with Tophat (sample by sample), assembling the aligned reads with Cufflinks (sample by sample), and merging sample-specific assemblies into a single consensus assembly with Cuffmerge. If the samples are from 2 or more conditions, you can then use Cuffdiff to detect genes that are differentially expressed or differentially spliced between the different conditions.

Regarding splicing: some genes will have alternative splicing that is not related to the contrast you are analyzing. For example, we might be comparing lung tissue to brain tissue, and both of gene A's isoforms (A1 and A2) are expressed in each tissue. If the levels of A1 are higher in lung and the levels of A2 are higher in brain, this is designated differential splicing and reported in the splicing.diff file produced by Cuffdiff. However, even if the levels of the two isoforms are the same across the two tissues, this is still alternative splicing since the gene is expressed in multiple isoforms.

So my question is this: how can I summarize alternative splicing from a Cufflinks analysis? For example, I want to report that X genes are alternatively spliced, there are Y alternative isoforms, there are Z cases of exon skipping, W cases of retained introns, etc. I'm not interested in differentially splicing, I'm interested in all alternative splicing whether there is differential usage across my contrast of interest or not.

PS The reason I've gone through so much trouble describing this is because it seems the terms alternative splicing and differential splicing have become conflated, both in the literature and online. I've been searching for quite a while for this information, but it seems whenever I search for info on alternative splicing I invariably find tools that want to report alternative splicing across a contrast--that is, differential splicing. That's fine, but not what I'm looking for here.

alternative-splicing cufflinks • 9.6k views

ADD COMMENT • link updated 15 months ago by Ram 43k • written 9.6 years ago by Daniel Standage 4.1k

0

Entering edit mode

I too am seeking some tools to assess differences in isoform prevalence/ratios between two different groups (e.g. samples from affected and unaffected human subjects). We've taken a slightly different pipeline, but it might provide you with some ideas, and identify some common goals.

Read Mapping in Tophat2 => BAM files imported into Seqmonk => Quantitate Raw Reads => Normalization (e.g. Limma VOOM) for linear analyses.

Seqmonk will estimate read counts for the individual mRNA isoforms in the supplied annotation file, and I'm fairly comfortable with this. What I'd like to do now is test the following hypothesis:

For a given gene's k isoforms, do we see an effect of [diagnostic status] on the relative abundances of the respective isoforms? Furthermore, I'd like to do this while controlling for one or more co-variate effects that might be confounded with [diagnostic status], such as continuously-measured [age] or categorical [sex or ethnicity]. To me, this sounds a bit like running a MAN(C)OVA per gene/isoform set. Wondering if anyone has come across / developed tools for accomplishing this sort of task.

ADD REPLY • link updated 15 months ago by Ram 43k • written 8.9 years ago by dantylee ▴ 40

0

Entering edit mode

I also want to find a tool that can define alternative splicing from an RNA-seq output such as BAM file or transcript gtf file. There is also a tool can define alternative splicing such as AS_profile but least documentation on it, so it quite hard to use it.

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by Wan Fahmi • 0

Ram · Answer 1 · 2015-07-10

Isoforms can also be due to alternative cleavage and polyadenylation sites, especially in the brain where some transcripts can have very long 3' UTR (see http://www.ncbi.nlm.nih.gov/pubmed/23520388).

A useful tool is IsoSCM, which can annotate 3' UTR and find isoforms with various 3' UTR that Cufflinks might have missed. IsoSCM does de novo assembly, and its output is not as detailed as Cufflinks (yet). However, it can be used to estimate differential use.

There is also a key point you need to consider: you can always find a large number of isoforms for a given gene. A better question is, which of these isoforms are expressed in a significant way according to Cufflinks. It is not a question of differential expression but of making sure that you are not seeing artifacts. You should also do some Northern Blots/RT-qPCR on a few of the isoforms you find.

Ram · Answer 2 · 2015-07-10

2

Entering edit mode

8.8 years ago

daxue ▴ 20

I'm not sure if you have found the answer. I had same problem and found the paper helpful. For single sample, you may want to try SpliceTrap or SpliceSeq.

ADD COMMENT • link 8.8 years ago by daxue ▴ 20

0

Entering edit mode

SpiceTrap may be useful and you also can run SpliceTrap using website here https://bioinformatics.cineca.it/rap/index.php

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by Wan Fahmi • 0

score 0 · Answer 3 · 2018-04-05

I just pushed an update of my R package IsoformSwitchAnalyzeR to Bioconductor which introduces a module for alternative splicing.

It directly supports import of Cufflinks/Cuffdiff data and allows for both visualisation of individual genes (see example here) as well as genome wide summary and analysis of alternative splicing (see example here).

As a bonus it also allows you to identify and analyze isoform switches with predicted functional consequcences.