Question: Recommended Tools For Alternative Splicing Detection From Rna-Seq Data
22
gravatar for Nicolas Rosewick
4.6 years ago by
Belgium, Brussels, Université Libre de Bruxelles / Université de Liège
Nicolas Rosewick5.3k wrote:

Hi,

I'm working on RNA-Seq data and wanted to start looking at alternative splicing events. Anyone have good advice/ideas to do that? I read that DEXSeq works well.

Edit > I'm working on human and bovine.

Thanks in advance,

N.

ADD COMMENTlink modified 4 weeks ago by Biostar ♦♦ 20 • written 4.6 years ago by Nicolas Rosewick5.3k
73
gravatar for Malachi Griffith
4.6 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith15k wrote:

Such a big question. There are many tools of several categories that might be relevant to this problem.

  1. Aligners capable of identifying splice sites from sequence reads (aka splice aware aligners). These include: TopHat, MapSplice, SpliceMap, HMMsplicer, GSNAP, STAR, RUM, SoapSplice, HISAT, etc. I saw a nice poster at AGBT13 that indicated that STAR performs very well compared to several competitors.
  2. Transcriptome assemblers that either perform de novo assembly of transcripts from sequence reads or do so with the help of a reference assembly (and perhaps even guided by known transcript annotations). These include: Cufflinks, Scripture, Trinity, Trans-ABySS, GRIT, etc. Of these, Cufflinks is probably the easiest to use while Trinity and Trans ABySS seem to yield impressive results in the hands of certain groups (particularly those that developed them...).
  3. Alternative expression tools that seek to identify isoform expression differences between two or more conditions. These include: Cuffdiff, ALEXA-seq, MISO, SplicingCompass, Flux Capacitor, JuncBASE, DEXSeq, MATS, SpliceR, FineSplice, ARH-seq, etc.

There are also many tools that are usually considered for straight differential expression but if run the right way might still yield results informative to alternative expression of isoforms. These include: edgeR, DEseq, htSeq, DEGseq, sSeq, etc.

Note that placing each tool in one of three categories is an over-simplification. Some span across the three activities and some are components of a workflow generated by a single research group. Overall the area is a bit of a wild west. More tools are being developed constantly and you will find aspects of all of them that leave you wanting something better. The problem is not a simple one and is an area of active research.

The intro section of the ALEXA-seq website has a summary of some relevant background reading and also contains a now out-of-date review of rna splicing tools.

The RNA-Seq Blog has a great list of relevant resources here: http://www.rna-seqblog.com/tag/alternative-splicing/

Here is a recent review: Integrative analysis of many RNA-seq datasets to study alternative splicing

We recently published a paper "Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud" that covers this topic in some detail and described many relevant tools in the Supplementary Tables. This resource is maintained in GitHub here and has a corresponding hands on tutorial: RNA-seq analysis tutorial.  The list of tools can be found here.

Finally here are some relevant posts from BioStar and SeqAnswers:

ADD COMMENTlink modified 2.0 years ago • written 4.6 years ago by Malachi Griffith15k
2

Thank for the info. I also like to add the Dream6 alternating splicing challenge webiste (http://www.the-dream-project.org/challenges/dream6-alternative-splicing-challenge). It's a little old but provides good background and some standard files and scoring metric. However, does anyone know about the winner result of this challenge? I couldn't find it on the web. Thank in advance.

ADD REPLYlink written 4.6 years ago by henryvuong700
1

Concerning Trinity, there is a recent paper in Nature Protocols that might help people trying to use this tool for their project.

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by Leonor Palmeira3.5k

mark for later use~_~

ADD REPLYlink written 2.7 years ago by hanguangchun120
3
gravatar for henryvuong
4.6 years ago by
henryvuong700
USA
henryvuong700 wrote:

I just came across this tool in BMC but haven't tried it yet SplicingCompass: differential splicing detection using RNA-Seq data. (link: http://bioinformatics.oxfordjournals.org/content/early/2013/02/28/bioinformatics.btt101.short) Hope it helps.

ADD COMMENTlink written 4.6 years ago by henryvuong700
3
gravatar for Charles Warden
3.5 years ago by
Charles Warden4.9k
Duarte, CA
Charles Warden4.9k wrote:

MATS is my favorite tool for splicing events (works with or without replicates):

http://rnaseq-mats.sourceforge.net/

MISO is another popular option:

http://genes.mit.edu/burgelab/miso/

ADD COMMENTlink written 3.5 years ago by Charles Warden4.9k

Hi Charles, I am trying to use MATS for the first time. It seems that you have used it quiet a bit. What they mean when they say that the read-length or length of each read should be same. I have two conditions and multiple replicates for each condition. I trimmed low quality bases and removed reads with less than 40 nt. My original reads were 75nt and right now I have reads whose length ranges from 40 -75 bp. Can they be used? Or I need to crop them to a fixed length before I can feed it to MATS.

Also, can it accept the bam files from latest version of tophat ? Or it has to be the older version which they ask you to download if you also want to align you reads. 

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Ashutosh Pandey11k
1

I think they mean that you would want to trim everything to 40 bp. If that is the case, you should receive an error if you try to use the mixed reads.

However, more importantly, I have found longer reads to be necessary to get good splicing event results.  For example, I have not found it useful for 40 bp single-end reads, and I have only gotten good results with 100 bp paired-end reads.  It is possible that the paired end requirement is more important than the length.  Hopefully that is the case - for example, I know that very poor quality reads will negatively affect your TopHat alignment (for example, I had a HiSeq dataset that tried to push for 140 bp reads, but the last 40 bp had so many problems that I needed to trim them to 100 bp to get good results).

The latest version should be OK - I know it works with TopHat2, but I don't remember the exact version number that I have tried.

ADD REPLYlink written 3.1 years ago by Charles Warden4.9k

Thanks Charles. I have a single end read data which I assume wont be as helpful as paired end. Also, my reads were 75 bp and I trimmed them to 60bp. I hope i can get some decent results. The good thing is that I have at least 6 replicates for my two samples that I am comparing but not sure how much having replicates help for MATS analysis. Thanks a lot again. 

ADD REPLYlink written 3.1 years ago by Ashutosh Pandey11k

I also have a question regarding read lengths in MATS tool. I aligned adapter-trimmed paired-end reads using STAR, and used sorted bam files as input to MATS for the analysis of splicing events. Originally, each mate length was 76, but after trimming it can be of any length. Given, I provide average insert length (r1 and r2) and the corresponding sd1 and sd2 values to MATS, does the read length should still need be same in different samples and replicates. If this is so, why MATS requires r1,r2 and sd1,sd2. 

 

ADD REPLYlink written 3.0 years ago by Anil Kesarwani60

I would recommend checking with the developer.

You could also just try specifying 76 bp an see what happens - I think you might get an error, but it has been a while.

ADD REPLYlink written 3.0 years ago by Charles Warden4.9k

are you sure that MATS work without replicates ?

ADD REPLYlink written 12 weeks ago by xd_d60

Yes - MATS will work without replicates.

If you do have replicates, JunctionSeq is also a relatively new option that I currently rank as my top choice for splicing analysis.

ADD REPLYlink written 11 weeks ago by Charles Warden4.9k
2
gravatar for Biojl
4.6 years ago by
Biojl1.5k
Barcelona
Biojl1.5k wrote:

In which species are you working? Do you have a reference genome for it?

You could give a try to the flux capacitor package: http://flux.sammeth.net/capacitor.html

ADD COMMENTlink written 4.6 years ago by Biojl1.5k
2
gravatar for Nicolas Rosewick
4.6 years ago by
Belgium, Brussels, Université Libre de Bruxelles / Université de Liège
Nicolas Rosewick5.3k wrote:

Do you think a workflow like this is good ?

  1. Alignment with STAR
  2. Reference-based assembly : cufflinks
  3. De-Novo assembly : Trinity
  4. Merge assembly
  5. Re-align reads on merged assembly
  6. Infer isoform expression (cuffdiff or RSEM)
  7. Alternative splicing analysis : DEXSeq

Any advices ? or ideas ?

ADD COMMENTlink written 4.6 years ago by Nicolas Rosewick5.3k
1

I think you need to read a little bit more on the topic because you are mixing things. For instance It doesn't make any sense to perform a de-novo assembly when you have the reference genome (human and cow) and you can use a reference-based assembly (Cufflinks) instead.

ADD REPLYlink written 4.6 years ago by Biojl1.5k
1

De-Novo will be usefull for a other aspect of my project (fusion)

ADD REPLYlink written 4.6 years ago by Nicolas Rosewick5.3k

Hi NicoBxl,

I am planning to do a similar workflow as yours, but on a totally non-reference animal. I performed de novo assembly with both Trinity and Velvet/Oases. I also try cufflinks assembly which base on the draft genome of this animal that we just obtained.

I just want to know which software that you used for merging assembly (step 4) and re-align reads on merged assembly (step 5).

Thank you in advance!

Phuong.

ADD REPLYlink written 21 months ago by pbigbig180
1
gravatar for Giovanni M Dall'Olio
4.6 years ago by
London, UK
Giovanni M Dall'Olio25k wrote:

Have you tried TopHat? It works on RNA-Seq data and can identify splicing events. The only drawback is that you need a reference genome. Which species are you studying?

ADD COMMENTlink written 4.6 years ago by Giovanni M Dall'Olio25k
1
gravatar for Lalit
17 months ago by
Lalit10
Jodhpur
Lalit10 wrote:

Hii You can try Olego for splice aligner and Quantas fro isoform prediction. Here is the link http://zhanglab.c2b2.columbia.edu/index.php/Quantas_Documentation

ADD COMMENTlink written 17 months ago by Lalit10
0
gravatar for garyhokawai
19 months ago by
garyhokawai0 wrote:
Very informative post, I just wonder if anyone has done alternative splicing analysis with nextera kit derived library, which has a broad size distributed fragments. Would that diverse library affect the algorithms? Any recommendation for this kind of library analysis?
ADD COMMENTlink written 19 months ago by garyhokawai0

I have done RNA-seq of Nextera data with BBMap on prokaryotes (which do not generally have differential splicing) which worked well, and human (and various other organism) RNA-seq of non-Nextera data which also worked well.  It's pretty robust to most noise-inducing factors like error rate, intron length, insert size, and so forth.  Note that I am BBMap's author.  But, it's pretty easy to use as it autodetects the insert size in a splice-aware manner.

Please note that there are two kinds of Nextera libraries - normal (fragment) and LMP.  LMP libraries require completely different processing from normal libraries, and they are the ones that would be expected to have a very broad size distribution.  Normal Nextera libraries are not really very different from randomly-fragmented libraries, aside from a frequency bias in the first ~10 bases.

ADD REPLYlink written 19 months ago by Brian Bushnell14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 514 users visited in the last hour