Which Programs Are You Relying On For Solid Data Analysis?
6
8
Entering edit mode
14.0 years ago

although this question may sound very similar to a previous one posted months ago, I was wondering if there would be any news on that. we are currently setting up our brand new cluster by installing the required software for data analysis, and although we already know what we are going to install, it's never useless to know how other groups are solving this task. for that reason, I am sharing here our production ideas, as well as some testing ones that we would like to try, for you to give us our opinion or either suggest other tools.

  1. we have been installing BioScope for a long time. it was our first software choice, since it is the corporative one, plus it is free of charge (right now). the problem is that, as we did not buy the suggested cluster, although we followed up all the software requirements, setting up our "custom" cluster has been taking Life Technologies almost 2 months, and we still do not have it up and running. we will see if we are able to have it ready by next month ;)

  2. our second option has been reading all the papers around we could, as well as asking some other laboratories, trying to find a consensus in which software to use, at least for mapping and SNP calling. after quite a few weeks of discussion, we have finally decided to create a custom pipeline based on BFAST (we have been told that it was the one that is currently best performing with SOLiD data) and SAMtools' Pileup. we are currently testing this pipeline, and we are being quite happy with it, although getting it to work exactly as we would like to needs further progress.

  3. although I have not done anything deep with Galaxy, I have found it very useful in the past for basic data manipulation. recentrly, I have gratefully found out that it has integrated NGS functionalities that would allow us to deal with our SOLiD data by mapping it with Bowtie (I have read that it is a nice BWT implementation, and that it works fine with SOLiD data) and doing SNP calling with SAMtools. since working with large datasets forces us to install Galaxy locally we are carefully evaluating this possibility, because it looks useful enough to try it, specially thinking about having everything nicely integrated in a single user interface.

EDIT: it turns out that we are currently installing Galaxy locally on our cluster, and we have found that the NGS toolbox beta from the usegalaxy.org website is no longer in beta stage, and indeed the mapping section includes more options, such as BFAST (indeed, the aligner we wanted to build our pipeline with).

so, summarizing:

  1. are there any groups out there working with BioScope only? are you happy enough not to try other options? is it as stable and powerful as advertised?

  2. which programs are you using for SOLiD data analysis? why would you select them?

  3. does anyone currently rely on Galaxy only for processing SOLiD data? is the local installation clear and stable enough to go through it? would you recommend

solid snp mapping analysis software • 8.0k views
ADD COMMENT
6
Entering edit mode
14.0 years ago

I use mostly Bioscope, but I've done a great deal of analysis on the side using BFAST, samtools/Picard, BEDTools, and GATK.

I found BWA and MAQ don't play well with SOLiD reads - they ignore the leading base and omit the CS and CQ fields from the BAM files, which cripples any downstream variant callers from utilizing the colorspace data.

ADD COMMENT
0
Entering edit mode

thanks jmanning2k. this was in fact the kind of answer I was expecting to receive. the fact that BWA and some other well known aligners do not perform as goog as they should with SOLiD data is getting generic on the community. it would be great if you could give here a brief opinion of the tools you just mentioned, since going for them is in fact our best option (aside from BioScope).

ADD REPLY
0
Entering edit mode

bwa does not work with paired-end SOLID data, either.

ADD REPLY
4
Entering edit mode
14.0 years ago

SHRiMP is an aligner designed for color-space. We've had very good experiences with it.

Note: a local installation of Galaxy would still need access to computational resources, so if you don't have your own cluster you may want to look into running it via could computing services.

ADD COMMENT
0
Entering edit mode

thanks Istvan. we have not come across many surrounding groups using SHRiMP, so maybe that was one of the reasons to leave it aside, favouring BFAST or any BWT algorithm implementation.

regarding the Galaxy installation, indeed we have our own cluster with plenty resources in it (well, at least by now, who knows in 6 months!) that we would like to squeeze. we will see if we are good at it ;)

ADD REPLY
4
Entering edit mode
14.0 years ago
Rm 8.3k

BWA+Samtools: http://genome.sph.umich.edu/wiki/Examples

Nesoni (it uses SHRiMP) is a high-throughput sequencing data analysis toolset, which the VBC has developed to cope with the flood of Illumina, 454, and SOLiD data now being produced.

http://bioinformatics.net.au/software.nesoni.shtml

Good idea to install Bowtie, BWA, MaQ, Bfast, FASTX-Toolkit, hscopy in the cluster.

Software packages for next gen sequence analysis

ADD COMMENT
1
Entering edit mode

we already had the seqtools wiki as a reference for following up the available programs our there, but it is great to know there are pipelines already being published. I guess this is the kind of thing NGS newbies like us would like to hear.

ADD REPLY
4
Entering edit mode
14.0 years ago
Ian 6.1k

I have been using Corona-Lite for a while now, but have been trying out other software as well. I had success using PerM http://code.google.com/p/perm/ (there is sister project called ComB for SNP detection), but i am now investigating SHRIMP as its gapped alignment strategy allows INDEL variations to be detected.

We are currently have Bioscope 1.2 installed, so i have not had experience of this mapper yet.

ADD COMMENT
1
Entering edit mode

My experience of SHRIMP so far boils down to: i) it really requires running big genomes on multiple machines; ii) extra effort is required (compared to Corona) to extract uniquely mapping reads, for example. I am still trying to get a feel for how the Smith-Waterman based params effect the running of SHRIMP. Hopefully i will have a better idea after using it for a large yeast mapping project that i now have (comparing results to Corona-Lite).

ADD REPLY
0
Entering edit mode

I went through the ComB webpage a couple of weeks ago, but since I didn't find any pubmed reference of it I was reluctant to try it. now that you mention the PerM mapping tool I searched for it, and in fact there's a Bioinformatics paper of it that I will surely evaluate: http://bioinformatics.oxfordjournals.org/content/25/19/2514

for the record: if you are still going for SHRiMP instead I guess that you haven't find PerM to perform much better than Corona-Lite. am I right? for your information, BioScope 1.2 has a better mapping algorithm, and seems like version 1.3 will even include BFAST.

ADD REPLY
0
Entering edit mode

PerM has gone through a lot of development and has performed better than Corona-Lite using the F4 (sensitive) seed setting. My main reason for wanting to try SHRIMP is so i can detect INDEL variants, as well as have a good mapper.

ADD REPLY
0
Entering edit mode

Re SHRIMP: be sure to check out the --strata flag (apparently new) which causes only the best scoring read to be reported. Caveat - it is still being monitored/tested

ADD REPLY
0
Entering edit mode

ComB has not published in JCB. The basic idea is good, but the evaluation has serious problems. Before they present a more reasonable evaluation, it is not recommended.

ADD REPLY
0
Entering edit mode

ComB has published in JCB recently. The basic idea is good, but the evaluation has serious problems. Before they present a more reasonable evaluation, it is not recommended.

ADD REPLY
4
Entering edit mode
14.0 years ago
Alonso ▴ 40

For performance PerM is far and away the choice for fastest ungapped alignment. I have used PerM and ComB iteratively to call SNPs very accurately, however they don't support any indels.

I asked someone who knows about the project and the strategy that the recommend is to use PerM and ComB for mapping and snp-calling while selecting an option in PerM to return the unmapped reads. Then the unmapped reads can be realigned with a slower aligner.

Supposedly they have an indel thing coming out soon but it seems to have been in development forever.

ADD COMMENT
0
Entering edit mode

thanks Alonso. it's not at all a bad idea to combine multiple programs, although we were trying to build up a simple pipeline. we will give them a try to check their performance, and then decide.

ADD REPLY
3
Entering edit mode
14.0 years ago
Mary 11k

You many know this already, but Galaxy offers a couple of "quickie" movies on SOLiD from their homepage: http://main.g2.bx.psu.edu/ Scroll through the screencasts there and you'll see them.

But also they recently had a developers conference with people who are actively using Galaxy for their local analysis. You might find some useful insights in the slides, or some people who might be worth exchanging some emails with on specifics:

http://bitbucket.org/galaxy/galaxy-central/wiki/DevConf2010

ADD COMMENT
0
Entering edit mode

I think I have seen all the screencasts available at the Galaxy website. in fact, a few of them were covering almost all we were planning to do with Galaxy, so that is why we are seriously considering intalling it on our cluster. but I had not hear before of the slide resource you have just mentioned, and I have found it a great place to check how people are dealing with their own issues, so thank you for addressing it here.

ADD REPLY

Login before adding your answer.

Traffic: 1322 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6