Best Spliced Aligner To Human Genome With Limited Rna-Seq Reads
6
6
Entering edit mode
12.3 years ago
Wjeck ▴ 480

I'm trying to perform a spliced alignment on a limited (<100,000) set of RNA-seq reads and I'd like to get the best spliced alignments possible. I had used Blat for this purpose, previously, but I am wondering if anyone has any more up-to-date solutions to recommend.

splicing alignment rna • 8.1k views
1
Entering edit mode

Good, ust bear in mind that BFAST is not a spliced aligner.

0
Entering edit mode

I am currently trying out tophat, which was easier to set up and run than I was expecting. It appears to operate well, although I was expecting more splice site calls than I ended up seeing. I'll have to compare it to tools like BLAT and BFAST for comparison. I should note that speed is much less of a consideration than quality in this application, since my # of reads is relatively low.

8
Entering edit mode
12.3 years ago

Hope this doesn't raise more questions than answering yours, and this is just my personal experience. I must say, Blat gave me quite a good impression in the past. It has support for large inserts, controlled via maxIntronSize and fine parameters. I am not aware of another tool that gives you control over that so easily. And it also seems to be able to align more reads to my reference than many of the dedicated short-read tools, and no need to build an store an index. What's not so good: e.g. no SAM/CIGAR output format, not multithreaded, the licence, etc. And it can be beaten in sensitivity and speed.

I made a little comparison, took a random sample (maybe some 10.000 for speed) of my reads in fastq and fasta and simply try how many reads can be mapped with different aligners, caveat: didn't care for alignment quality (should check for validity though), and no Introns, and maybe it's totally screwed, here is my list, your mileage may vary:

1. BFAST (~85%) (BLAT-like fast accurate search tool)
2. LastZ (~80%)
3. Novoalign (~80%)
4. Blat (~74%)
5. Bowtie (~60%)
6. bwa (~54%)

I recommend: Bowtie&Tophat: because it is developed specifically for that, LastZ: has hell of a lot parameters, SAM output etc., BFAST: because it gave me most alignments, was the slowest though.

You have to find out, how to set parameters to allow for large inserts/introns though, try to play with the gap costs. Would be very nice if you post your experiences!

0
Entering edit mode

Just out of curiosity, how does one get BFAST to do spliced alignment? (I know it does gapped alignment for small indel sizes, but that is a different problem)

0
Entering edit mode

Good question! Actually, after consulting the bfast documentation it, doesn't, I guess it doesn't! Was tricked by its name 'BLAT-like...' there. I recommend it simply because it gave me most alignments, might be useful in a staged hybrid aproach to align most reads and then only align reads that are left-over in spliced-mode with a different tool. LastZ either doesn't do spliced alignment, but one could twiddle the gap-costs to allow for larger gaps, didn't find such option for BFAST. So I would wish for a BFAST with the righ option set of LastZ

0
Entering edit mode

Thanks, that clarifies things for me :-)

0
Entering edit mode

Bowtie/tophat worked beautifully, but I think BFAST is probably the ideal tool for this limited data set, where speed is less critical than excellent alignment and splice finding. Many thanks!!

0
Entering edit mode

After checking my results more closely i found that many of the extra alignments from bfast are spurious, something like 1M2I1M2... So it is important to post filter rigorously for min. sequence identity and coverage.

0
Entering edit mode

So far as I am aware of, tophat is the most widely used. It would be good for someone to write a benchmark paper...

0
Entering edit mode

Added to the mix from KU and UNC collaboration: Mapsplice. This has proven extremely useful on top of the above for it's fusion detection capabilities.

0
Entering edit mode

Long-term followup on the issue of splice-aware aligners:

http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btr427v1

0
Entering edit mode

Hi Sean, I think this paper supports well my initial assumption.

7
Entering edit mode
12.2 years ago
Bioinfosm ▴ 620

Just heard of this SpliceMap tool from stanford, not mentioned in any of the above lists: http://www.stanford.edu/group/wonglab/SpliceMap/index.html

0
Entering edit mode

very convenient, thoroughly annotated and contains lots of useful tools

6
Entering edit mode
12.3 years ago
brentp 24k

I've had good luck with tophat (and paper) and have recently been trying out gsnap (and paper).

Both can find novel splice sites and/or take a list of known splice sites.

gsnap seems to find more splice sites and is more tolerant of indels especially if you have longer reads.

0
Entering edit mode

Can gsnap find novel splice sites?

0
Entering edit mode

yes, it can. it has a number of parameters to fine tune this. also check the title of the linked paper :-)

5
Entering edit mode
12.2 years ago
Greg Grant ▴ 50

We've created a pipeline using bowtie and blat which maps against the genome and transcriptome and merges the results. I'll be presenting this pipeline at the MGED conference in Boston in a poster. Using simulated data we are achieving 10% more accurate alignments than any of the other methods, we compared to TopHat, GSNAP, SpliceMap, BWA, NOVOALIGN, SOAP. TopHat has a good false positive rate for junctions, by basically just calling the really easy stuff, so it has a very high false negative rate, it was something like 1% false positive and 50% false negative, compared to 5% false positive and 10% false negative. We were also going to use BFAST but was told by the author Nils Homer the following: "BFAST is meant for whole genome resequencing, so it will not be able to find splice forms unless those splice forms are found in the reference. Software like cufflinks or tophat would be more appropriate." If anybody wants to use our pipeline email me, ggrant@pcbi.upenn.edu

1
Entering edit mode

Any experimentation with MapSplice? We've gotten surprisingly good results with their algorithm, with the caveat that it takes about .5 TB of hard drive space just to save the temporary files it generates.

0
Entering edit mode

@Wjeck We're are going to try MapSplice, any suggestion or comment (like relevant flags or something) ?

3
Entering edit mode
12.3 years ago
Darked89 4.2k

"What is the best pipeline for human whole exome sequencing?"

You may also think about using some hybrid approach:

• map everything in an unspliced mode with last (gives more hits than blat for the same set, but some are unrealistic -> too many indels)
• check few different cutoffs, for what is reliable mapping (3 indels?)
• map all unmapped & unreliable hits with few spliced mappers, compare & combine the results. Incomplete list
0
Entering edit mode

vote up: especially for the hybrid approach

0
Entering edit mode
11.9 years ago
Zee ▴ 40

I think there is a lot of potential in this area and worth trying out different approaches. We have been trying to gain more traction with RNASeq data and Novoalign (we're the developers) and it seems there is room to combine our mapping strategy with an approach like Tophat's or SplitSeek.

I will be happy to share more once I start seeing some promising results. From what I can already see there are going to be many cases where you need to consider gapped alignments in short reads.

0
Entering edit mode

and I will be happy if I your Novoalign aren't for sell...

0
Entering edit mode