Automatic Analysis Pipline For Raw Sequenced Data
1
2
Entering edit mode
12.2 years ago
Stevelor ▴ 310

Hey all,

we are trying to establish a fully automatical standard analysis pipeline for our sequenced samples from a HiSeq2000 machine. I want to know what are your experiences about this...how far can we go...which steps are inevitable, which can not be done in an automatic way...and have you also worked on a "genome content management system" like this??? what are your experiences??

So what we have already realized is: For every sequencing run different SQL tables contain the information of each sample, for example the samplesheet casava needs, the sample characteristics, what kind of sequencing strand specific or not, the path where you can find the sample data etc. , insert size (PE), maybe metatranscriptomes or -genomes...and a lot more... We use this information to build an workflow...as first step casava gets started, afterwards the samples get moved to the corresponding project folder, fastqc as quality control gets executed.

Next steps would be mapping an quantification...All this steps are traceable in a CMS. Every big events creates an automatic post in this CMS.

Would be nice to get some feedback about this.... Is it possible AND useful to create such a pipe?? Cause of course every sample is a bit different...what to you think?!

Thanks!

Steve

pipeline next-gen sequencing data • 2.6k views
ADD COMMENT
1
Entering edit mode

You're describing a LIMS system. These are not trivial to develop, and not cheap to buy!

ADD REPLY
1
Entering edit mode

I think the first step would be to see what's out there (bioteam minilims, galaxy, taverna, stuff built with ruffus or paver) and report your finding in a blog post or the seqanswers wiki

ADD REPLY
1
Entering edit mode
12.2 years ago

Hello SteveLor,

To some extent your question reminds me of a previous post @biostar, you might want to have a look at it.

In our lab we're using and extending Brad's pipeline:

https://github.com/chapmanb/bcbb/blob/master/nextgen/README.md

http://bcbio.wordpress.com/

There are a few issues that need to be addressed, but overall does the job for us. The Galaxy side, which would greatly help on sample management, is still on the works due to IT security concerns.

Hope that helps !

ADD COMMENT

Login before adding your answer.

Traffic: 1874 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6