Question: Thoughts On Galaxy For Next Generation Sequence Data Analysis
5
gravatar for Travis
5.7 years ago by
Travis2.7k
USA
Travis2.7k wrote:

Hi,

We would like to be able to save workflows for use by non-bioinformatics users in-house. Whilst going through the laborious task of stringing together workflows into individual scripts I began to wonder if Galaxy might represent a less labor-intensive option...

Therefore I am trying to get a feel for who uses Galaxy for their NGS analyses (or who doesn't) and why.

How are upload/download speeds? I can only assume that uploading large datasets would be a nightmare and we plan to have some very large data sets. Data processing speeds would also be important - what kind of power does the public version offer?

Do people use the public instance or create local versions? I assume to have custom workflows immediately available it would be necessary to have a local instance. This would eliminate the uploads and may mean faster processing too.

I would like to get people's opinions before I dedicate real time to it.

Thanks in advance!

next-gen galaxy sequencing • 6.9k views
ADD COMMENTlink written 5.7 years ago by Travis2.7k
8
gravatar for Casey Bergman
5.7 years ago by
Casey Bergman17k
Manchester, UK
Casey Bergman17k wrote:

Your question is very much in the spirit of the times, e.g. from Peter Cock's IMSB twitter feed:

Do you want to #usegalaxy too? RT @passDan Getting the feeling that Galaxy is the cool kid and everyone wants to be his friend. #bosc2011

Farhat's legitimate space considerations (which are also true to some extent for non-Galaxy based workflows), I would say that the answer to your query is a definitive "yes", Galaxy is a very serious contender for remote and local NGS analyses. I would even venture to say that the is was the implementation of NGS tools and Cloud images by Galaxy in 2010 that has led to the explosive growth in users of Galaxy over the last 12 months, as evidenced by posts to the Galaxy users mailing list:

alt text

I know a half or dozen wet-lab biologists who use Galaxy to do their own NGS analyses because: 1) they don't have to install code, grok UNIX or program; 2) they get free storage and compute; and 3) they can share their results with supervisors/collaborators, etc.

There is also a big push from bioinformaticians to use Galaxy. You can get a reals sense for this on the galaxy developers mailing list. We are currently rolling out a local Galaxy installing in our bioinformatics core facilty so we can provide NGS results to users via an interface they understand. The aim is to use Galaxy to cut-down on the time required to help explain results/protocols and allow users to perform their own follow-up analyses. We are just trialling this now and though local version on a desktop are easy to get going and customize for NGS work, we have not launched a production instance so I can't report on this yet.

Lastly, there is going to be really productive interaction in the future between Galaxy and Taverna, the two major players in the bioinformatics workflows market. I predict a synergistic co-evolution between Galaxy and Taverna, similar to what was observed between the UCSC and Ensembl Browsers, that will generate a lot of new functionality specifically in the area of NGS.

ADD COMMENTlink written 5.7 years ago by Casey Bergman17k
2

I forgot to add one major advantage of Galaxy, the automatic recording of all the steps in a workflow along with the parameters. It is quite easy to forget to record how a one-off analysis was performed if one is not used to doing that.

ADD REPLYlink written 5.7 years ago by Farhat2.7k

+1 for the answer but I wish I could do another +1 for quoting my tweet from ISMB!

ADD REPLYlink written 5.5 years ago by Daniel3.3k

in Galaxy will end up storing the uncompressed fastq file, sam file resulting from alignment, bam file and the sorted bam file. This can lead to heavy disk activity which can slow down the analysis unless you have fast and lots of storage. Another thing I noticed with Galaxy (though it may be my install) was that simple tasks like uploading a file would peg one core of the CPU at 100%. http://methoo.com

ADD REPLYlink written 5.4 years ago by User 71130
7
gravatar for Leonor Palmeira
5.7 years ago by
Leonor Palmeira3.5k
Li├Ęge, Belgium
Leonor Palmeira3.5k wrote:

I would say that Galaxy is definitely a very good option for this! If you want to use their online version, it might be impossible for large datasets (they have a 1Gb limit on the upload size, so for many NGS, it's not an option).

But I have been using some of their Python scripts for my own pipelines and it is very useful! The whole Galaxy project is open source ans easily installable. I use it in combination with my own scripts and build my pipelines this way.

If you have a webserver, I would recommend you install your own version of Galaxy on it and then add your own scripts to it. You would then be able to build the complete pipeline and have it ready online for your users.

ADD COMMENTlink written 5.7 years ago by Leonor Palmeira3.5k
3

Yes, actually, any language should work as long as it's installed on your server. Galaxy is "just" a front-end to a lot of scripts, with the appropriate documentation in XML files.

Here are some details on how to incorporate a new script: http://wiki.g2.bx.psu.edu/Admin/Tools/Add%20Tool%20Tutorial

ADD REPLYlink written 5.7 years ago by Leonor Palmeira3.5k

Can custom scripts be written in Perl?

ADD REPLYlink written 5.7 years ago by Travis2.7k

The file size limit is imposed by what can uploaded to a Browser. You can also upload to Galaxy main (and a local instance after configuration) by FTP: http://wiki.g2.bx.psu.edu/Learn/Upload%20via%20FTP

ADD REPLYlink written 5.7 years ago by Brad Chapman9.1k
6
gravatar for Mnkyboy
5.7 years ago by
Mnkyboy60
PNW
Mnkyboy60 wrote:

I am a big fan of using Galaxy on the cloud for RNA-seq. I use it often to offload from our local servers and to test out data sets. Easy to use and very cost effective.

http://wiki.g2.bx.psu.edu/Admin/Cloud

ADD COMMENTlink written 5.7 years ago by Mnkyboy60
4
gravatar for Farhat
5.7 years ago by
Farhat2.7k
Pune, India
Farhat2.7k wrote:

Galaxy is a good option for workflows especially for nonprogrammers. I have a local install as well as occasionally use the main one. I do not use the main one for NGS as data transfer is a huge bottleneck. Also, occasionally you may have to wait before your analysis starts if they are busy. The local install is not very difficult to set up but one serious issue I faced with Galaxy is that it stores the results of every step in uncompressed format. Thus, e.g. a command like

bwa samse ~/genomes/hsap/hg19.fa sampleTF8.sai sampleTF8.de.fastq.gz |samtools view -bS -|samtools sort - sampleTF8

in Galaxy will end up storing the uncompressed fastq file, sam file resulting from alignment, bam file and the sorted bam file. This can lead to heavy disk activity which can slow down the analysis unless you have fast and lots of storage. Another thing I noticed with Galaxy (though it may be my install) was that simple tasks like uploading a file would peg one core of the CPU at 100%.

ADD COMMENTlink written 5.7 years ago by Farhat2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1311 users visited in the last hour