Best Spliced Aligner To Human Genome With Limited Rna-Seq Reads
6
6
Entering edit mode
13.8 years ago
Wjeck ▴ 490

I'm trying to perform a spliced alignment on a limited (<100,000) set of RNA-seq reads and I'd like to get the best spliced alignments possible. I had used Blat for this purpose, previously, but I am wondering if anyone has any more up-to-date solutions to recommend.

splicing alignment rna • 9.9k views
ADD COMMENT
1
Entering edit mode

Good, ust bear in mind that BFAST is not a spliced aligner.

ADD REPLY
0
Entering edit mode

I am currently trying out tophat, which was easier to set up and run than I was expecting. It appears to operate well, although I was expecting more splice site calls than I ended up seeing. I'll have to compare it to tools like BLAT and BFAST for comparison. I should note that speed is much less of a consideration than quality in this application, since my # of reads is relatively low.

ADD REPLY
8
Entering edit mode
13.8 years ago
Michael 54k

Hope this doesn't raise more questions than answering yours, and this is just my personal experience. I must say, Blat gave me quite a good impression in the past. It has support for large inserts, controlled via maxIntronSize and fine parameters. I am not aware of another tool that gives you control over that so easily. And it also seems to be able to align more reads to my reference than many of the dedicated short-read tools, and no need to build an store an index. What's not so good: e.g. no SAM/CIGAR output format, not multithreaded, the licence, etc. And it can be beaten in sensitivity and speed.

I made a little comparison, took a random sample (maybe some 10.000 for speed) of my reads in fastq and fasta and simply try how many reads can be mapped with different aligners, caveat: didn't care for alignment quality (should check for validity though), and no Introns, and maybe it's totally screwed, here is my list, your mileage may vary:

  1. BFAST (~85%) (BLAT-like fast accurate search tool)
  2. LastZ (~80%)
  3. Novoalign (~80%)
  4. Blat (~74%)
  5. Bowtie (~60%)
  6. bwa (~54%)

I recommend: Bowtie&Tophat: because it is developed specifically for that, LastZ: has hell of a lot parameters, SAM output etc., BFAST: because it gave me most alignments, was the slowest though.

You have to find out, how to set parameters to allow for large inserts/introns though, try to play with the gap costs. Would be very nice if you post your experiences!

ADD COMMENT
0
Entering edit mode

Just out of curiosity, how does one get BFAST to do spliced alignment? (I know it does gapped alignment for small indel sizes, but that is a different problem)

ADD REPLY
0
Entering edit mode

Good question! Actually, after consulting the bfast documentation it, doesn't, I guess it doesn't! Was tricked by its name 'BLAT-like...' there. I recommend it simply because it gave me most alignments, might be useful in a staged hybrid aproach to align most reads and then only align reads that are left-over in spliced-mode with a different tool. LastZ either doesn't do spliced alignment, but one could twiddle the gap-costs to allow for larger gaps, didn't find such option for BFAST. So I would wish for a BFAST with the righ option set of LastZ

ADD REPLY
0
Entering edit mode

Thanks, that clarifies things for me :-)

ADD REPLY
0
Entering edit mode

Bowtie/tophat worked beautifully, but I think BFAST is probably the ideal tool for this limited data set, where speed is less critical than excellent alignment and splice finding. Many thanks!!

ADD REPLY
0
Entering edit mode

After checking my results more closely i found that many of the extra alignments from bfast are spurious, something like 1M2I1M2... So it is important to post filter rigorously for min. sequence identity and coverage.

ADD REPLY
0
Entering edit mode

So far as I am aware of, tophat is the most widely used. It would be good for someone to write a benchmark paper...

ADD REPLY
0
Entering edit mode

Added to the mix from KU and UNC collaboration: Mapsplice. This has proven extremely useful on top of the above for it's fusion detection capabilities.

ADD REPLY
0
Entering edit mode

Long-term followup on the issue of splice-aware aligners:

http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btr427v1

ADD REPLY
0
Entering edit mode

Hi Sean, I think this paper supports well my initial assumption.

ADD REPLY
7
Entering edit mode
13.8 years ago
Bioinfosm ▴ 620

Just heard of this SpliceMap tool from stanford, not mentioned in any of the above lists: http://www.stanford.edu/group/wonglab/SpliceMap/index.html

Some discussion of the tool: http://seqanswers.com/forums/showthread.php?t=5507

ADD COMMENT
0
Entering edit mode

very convenient, thoroughly annotated and contains lots of useful tools

ADD REPLY
6
Entering edit mode
13.8 years ago
brentp 24k

I've had good luck with tophat (and paper) and have recently been trying out gsnap (and paper).

Both can find novel splice sites and/or take a list of known splice sites.

gsnap seems to find more splice sites and is more tolerant of indels especially if you have longer reads.

ADD COMMENT
0
Entering edit mode

Can gsnap find novel splice sites?

ADD REPLY
0
Entering edit mode

yes, it can. it has a number of parameters to fine tune this. also check the title of the linked paper :-)

ADD REPLY
5
Entering edit mode
13.8 years ago
Greg Grant ▴ 50

We've created a pipeline using bowtie and blat which maps against the genome and transcriptome and merges the results. I'll be presenting this pipeline at the MGED conference in Boston in a poster. Using simulated data we are achieving 10% more accurate alignments than any of the other methods, we compared to TopHat, GSNAP, SpliceMap, BWA, NOVOALIGN, SOAP. TopHat has a good false positive rate for junctions, by basically just calling the really easy stuff, so it has a very high false negative rate, it was something like 1% false positive and 50% false negative, compared to 5% false positive and 10% false negative. We were also going to use BFAST but was told by the author Nils Homer the following: "BFAST is meant for whole genome resequencing, so it will not be able to find splice forms unless those splice forms are found in the reference. Software like cufflinks or tophat would be more appropriate." If anybody wants to use our pipeline email me, ggrant@pcbi.upenn.edu

ADD COMMENT
1
Entering edit mode

Any experimentation with MapSplice? We've gotten surprisingly good results with their algorithm, with the caveat that it takes about .5 TB of hard drive space just to save the temporary files it generates.

ADD REPLY
0
Entering edit mode

@Wjeck We're are going to try MapSplice, any suggestion or comment (like relevant flags or something) ?

ADD REPLY
3
Entering edit mode
13.8 years ago
Darked89 4.6k

First thing to look is the quality of your RNA-Seq reads. Check the thread:

"What is the best pipeline for human whole exome sequencing?"

You may also think about using some hybrid approach:

  • map everything in an unspliced mode with last (gives more hits than blat for the same set, but some are unrealistic -> too many indels)
  • check few different cutoffs, for what is reliable mapping (3 indels?)
  • map all unmapped & unreliable hits with few spliced mappers, compare & combine the results. Incomplete list
ADD COMMENT
0
Entering edit mode

vote up: especially for the hybrid approach

ADD REPLY
0
Entering edit mode
13.5 years ago
Zee ▴ 40

I think there is a lot of potential in this area and worth trying out different approaches. We have been trying to gain more traction with RNASeq data and Novoalign (we're the developers) and it seems there is room to combine our mapping strategy with an approach like Tophat's or SplitSeek.

I will be happy to share more once I start seeing some promising results. From what I can already see there are going to be many cases where you need to consider gapped alignments in short reads.

ADD COMMENT
0
Entering edit mode

and I will be happy if I your Novoalign aren't for sell...

ADD REPLY
0
Entering edit mode

and I will be happy if I can use your Novoalign for free...

ADD REPLY

Login before adding your answer.

Traffic: 1530 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6