Question

Automatic Analysis Pipline For Raw Sequenced Data

2

Entering edit mode

12.2 years ago

Stevelor ▴ 310

Hey all,

we are trying to establish a fully automatical standard analysis pipeline for our sequenced samples from a HiSeq2000 machine. I want to know what are your experiences about this...how far can we go...which steps are inevitable, which can not be done in an automatic way...and have you also worked on a "genome content management system" like this??? what are your experiences??

So what we have already realized is: For every sequencing run different SQL tables contain the information of each sample, for example the samplesheet casava needs, the sample characteristics, what kind of sequencing strand specific or not, the path where you can find the sample data etc. , insert size (PE), maybe metatranscriptomes or -genomes...and a lot more... We use this information to build an workflow...as first step casava gets started, afterwards the samples get moved to the corresponding project folder, fastqc as quality control gets executed.

Next steps would be mapping an quantification...All this steps are traceable in a CMS. Every big events creates an automatic post in this CMS.

Would be nice to get some feedback about this.... Is it possible AND useful to create such a pipe?? Cause of course every sample is a bit different...what to you think?!

Thanks!

Steve

pipeline next-gen sequencing data • 2.6k views

ADD COMMENT • link updated 12.2 years ago by Roman Valls Guimerà ▴ 620 • written 12.2 years ago by Stevelor ▴ 310

1

Entering edit mode

You're describing a LIMS system. These are not trivial to develop, and not cheap to buy!

ADD REPLY • link 12.2 years ago by User 59 13k

1

Entering edit mode

I think the first step would be to see what's out there (bioteam minilims, galaxy, taverna, stuff built with ruffus or paver) and report your finding in a blog post or the seqanswers wiki

ADD REPLY • link 12.2 years ago by Jeremy Leipzig 22k

Ram · Answer 1 · 2012-02-02

Hello SteveLor,

To some extent your question reminds me of a previous post @biostar, you might want to have a look at it.

In our lab we're using and extending Brad's pipeline:

https://github.com/chapmanb/bcbb/blob/master/nextgen/README.md

http://bcbio.wordpress.com/

There are a few issues that need to be addressed, but overall does the job for us. The Galaxy side, which would greatly help on sample management, is still on the works due to IT security concerns.

Hope that helps !