Question: Which Programs Are You Relying On For Solid Data Analysis?
8
gravatar for Jorge Amigo
9.7 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

although this question may sound very similar to a previous one posted months ago, I was wondering if there would be any news on that. we are currently setting up our brand new cluster by installing the required software for data analysis, and although we already know what we are going to install, it's never useless to know how other groups are solving this task. for that reason, I am sharing here our production ideas, as well as some testing ones that we would like to try, for you to give us our opinion or either suggest other tools.

  1. we have been installing BioScope for a long time. it was our first software choice, since it is the corporative one, plus it is free of charge (right now). the problem is that, as we did not buy the suggested cluster, although we followed up all the software requirements, setting up our "custom" cluster has been taking Life Technologies almost 2 months, and we still do not have it up and running. we will see if we are able to have it ready by next month ;)

  2. our second option has been reading all the papers around we could, as well as asking some other laboratories, trying to find a consensus in which software to use, at least for mapping and SNP calling. after quite a few weeks of discussion, we have finally decided to create a custom pipeline based on BFAST (we have been told that it was the one that is currently best performing with SOLiD data) and SAMtools' Pileup. we are currently testing this pipeline, and we are being quite happy with it, although getting it to work exactly as we would like to needs further progress.

  3. although I have not done anything deep with Galaxy, I have found it very useful in the past for basic data manipulation. recentrly, I have gratefully found out that it has integrated NGS functionalities that would allow us to deal with our SOLiD data by mapping it with Bowtie (I have read that it is a nice BWT implementation, and that it works fine with SOLiD data) and doing SNP calling with SAMtools. since working with large datasets forces us to install Galaxy locally we are carefully evaluating this possibility, because it looks useful enough to try it, specially thinking about having everything nicely integrated in a single user interface.

EDIT: it turns out that we are currently installing Galaxy locally on our cluster, and we have found that the NGS toolbox beta from the usegalaxy.org website is no longer in beta stage, and indeed the mapping section includes more options, such as BFAST (indeed, the aligner we wanted to build our pipeline with).

so, summarizing:

  1. are there any groups out there working with BioScope only? are you happy enough not to try other options? is it as stable and powerful as advertised?

  2. which programs are you using for SOLiD data analysis? why would you select them?

  3. does anyone currently rely on Galaxy only for processing SOLiD data? is the local installation clear and stable enough to go through it? would you recommend

software solid analysis mapping snp • 5.6k views
ADD COMMENTlink modified 9.0 years ago by Alonso40 • written 9.7 years ago by Jorge Amigo11k
6
gravatar for Jonathan Manning
9.7 years ago by
Near Boston, MA
Jonathan Manning640 wrote:

I use mostly Bioscope, but I've done a great deal of analysis on the side using BFAST, samtools/Picard, BEDTools, and GATK.

I found BWA and MAQ don't play well with SOLiD reads - they ignore the leading base and omit the CS and CQ fields from the BAM files, which cripples any downstream variant callers from utilizing the colorspace data.

ADD COMMENTlink written 9.7 years ago by Jonathan Manning640

thanks jmanning2k. this was in fact the kind of answer I was expecting to receive. the fact that BWA and some other well known aligners do not perform as goog as they should with SOLiD data is getting generic on the community. it would be great if you could give here a brief opinion of the tools you just mentioned, since going for them is in fact our best option (aside from BioScope).

ADD REPLYlink written 9.7 years ago by Jorge Amigo11k

bwa does not work with paired-end SOLID data, either.

ADD REPLYlink written 9.1 years ago by Sophia300
4
gravatar for Istvan Albert
9.7 years ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

SHRiMP is an aligner designed for color-space. We've had very good experiences with it.

Note: a local installation of Galaxy would still need access to computational resources, so if you don't have your own cluster you may want to look into running it via could computing services.

ADD COMMENTlink written 9.7 years ago by Istvan Albert ♦♦ 84k

thanks Istvan. we have not come across many surrounding groups using SHRiMP, so maybe that was one of the reasons to leave it aside, favouring BFAST or any BWT algorithm implementation.

regarding the Galaxy installation, indeed we have our own cluster with plenty resources in it (well, at least by now, who knows in 6 months!) that we would like to squeeze. we will see if we are good at it ;)

ADD REPLYlink written 9.7 years ago by Jorge Amigo11k
4
gravatar for Rm
9.7 years ago by
Rm8.0k
Danville, PA
Rm8.0k wrote:

BWA+Samtools: http://genome.sph.umich.edu/wiki/Examples

Nesoni (it uses SHRiMP) is a high-throughput sequencing data analysis toolset, which the VBC has developed to cope with the flood of Illumina, 454, and SOLiD data now being produced.

http://bioinformatics.net.au/software.nesoni.shtml

Good idea to install Bowtie, BWA, MaQ, Bfast, FASTX-Toolkit, hscopy in the cluster.

Software packages for next gen sequence analysis

ADD COMMENTlink written 9.7 years ago by Rm8.0k
1

we already had the seqtools wiki as a reference for following up the available programs our there, but it is great to know there are pipelines already being published. I guess this is the kind of thing NGS newbies like us would like to hear.

ADD REPLYlink written 9.7 years ago by Jorge Amigo11k
4
gravatar for Ian
9.7 years ago by
Ian5.6k
University of Manchester, UK
Ian5.6k wrote:

I have been using Corona-Lite for a while now, but have been trying out other software as well. I had success using PerM http://code.google.com/p/perm/ (there is sister project called ComB for SNP detection), but i am now investigating SHRIMP as its gapped alignment strategy allows INDEL variations to be detected.

We are currently have Bioscope 1.2 installed, so i have not had experience of this mapper yet.

ADD COMMENTlink written 9.7 years ago by Ian5.6k
1

My experience of SHRIMP so far boils down to: i) it really requires running big genomes on multiple machines; ii) extra effort is required (compared to Corona) to extract uniquely mapping reads, for example. I am still trying to get a feel for how the Smith-Waterman based params effect the running of SHRIMP. Hopefully i will have a better idea after using it for a large yeast mapping project that i now have (comparing results to Corona-Lite).

ADD REPLYlink written 9.7 years ago by Ian5.6k

I went through the ComB webpage a couple of weeks ago, but since I didn't find any pubmed reference of it I was reluctant to try it. now that you mention the PerM mapping tool I searched for it, and in fact there's a Bioinformatics paper of it that I will surely evaluate: http://bioinformatics.oxfordjournals.org/content/25/19/2514

for the record: if you are still going for SHRiMP instead I guess that you haven't find PerM to perform much better than Corona-Lite. am I right? for your information, BioScope 1.2 has a better mapping algorithm, and seems like version 1.3 will even include BFAST.

ADD REPLYlink written 9.7 years ago by Jorge Amigo11k

PerM has gone through a lot of development and has performed better than Corona-Lite using the F4 (sensitive) seed setting. My main reason for wanting to try SHRIMP is so i can detect INDEL variants, as well as have a good mapper.

ADD REPLYlink written 9.7 years ago by Ian5.6k

Re SHRIMP: be sure to check out the --strata flag (apparently new) which causes only the best scoring read to be reported. Caveat - it is still being monitored/tested

ADD REPLYlink written 9.7 years ago by Ian5.6k

ComB has not published in JCB. The basic idea is good, but the evaluation has serious problems. Before they present a more reasonable evaluation, it is not recommended.

ADD REPLYlink written 9.0 years ago by lh332k

ComB has published in JCB recently. The basic idea is good, but the evaluation has serious problems. Before they present a more reasonable evaluation, it is not recommended.

ADD REPLYlink written 9.0 years ago by lh332k
4
gravatar for Alonso
9.7 years ago by
Alonso40
Alonso40 wrote:

For performance PerM is far and away the choice for fastest ungapped alignment. I have used PerM and ComB iteratively to call SNPs very accurately, however they don't support any indels.

I asked someone who knows about the project and the strategy that the recommend is to use PerM and ComB for mapping and snp-calling while selecting an option in PerM to return the unmapped reads. Then the unmapped reads can be realigned with a slower aligner.

Supposedly they have an indel thing coming out soon but it seems to have been in development forever.

ADD COMMENTlink written 9.7 years ago by Alonso40

thanks Alonso. it's not at all a bad idea to combine multiple programs, although we were trying to build up a simple pipeline. we will give them a try to check their performance, and then decide.

ADD REPLYlink written 9.7 years ago by Jorge Amigo11k
3
gravatar for Mary
9.7 years ago by
Mary11k
Boston MA area
Mary11k wrote:

You many know this already, but Galaxy offers a couple of "quickie" movies on SOLiD from their homepage: http://main.g2.bx.psu.edu/ Scroll through the screencasts there and you'll see them.

But also they recently had a developers conference with people who are actively using Galaxy for their local analysis. You might find some useful insights in the slides, or some people who might be worth exchanging some emails with on specifics:

http://bitbucket.org/galaxy/galaxy-central/wiki/DevConf2010

ADD COMMENTlink modified 9 months ago by RamRS27k • written 9.7 years ago by Mary11k

I think I have seen all the screencasts available at the Galaxy website. in fact, a few of them were covering almost all we were planning to do with Galaxy, so that is why we are seriously considering intalling it on our cluster. but I had not hear before of the slide resource you have just mentioned, and I have found it a great place to check how people are dealing with their own issues, so thank you for addressing it here.

ADD REPLYlink written 9.7 years ago by Jorge Amigo11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 835 users visited in the last hour