Question

Fully Automated And Integrated Sequence Analysis

5

Entering edit mode

13.6 years ago

Eric Fournier ★ 1.4k

Greetings,

Often, wet lab biologists will come to me with DNA/cDNA sequences and no idea of which tools (besides BLAST) they can use to analyse them. I've found that for them, and oftentimes for me too, the limiting factors in analysis become (1) the number of specialized tools of whose existence we are aware and (2) the time needed to submit our sequence through all of these tools. The parsing of the results themselves are often trivial: homologies are either present or absent, SNPs are there or aren't, etc.

Also, the first steps of the decision tree for choosing which analysis to perform are often relatively simple, too. Nucleotide sequence: is it DNA or RNA? If it is DNA: Where does it map? Are there SNPs, Indels? If it is RNA:, is it a known splice variant or a novel one? If it does not map anywhere, does it have an ORF? Etc.

Thus, I have been wondering: is anyone aware of efforts having been put forth toward creating an heuristic driven, fully integrated, one-stop "automagical" pipeline to identify/analyze sequences of interest? If not, would such software be feasible, and do you believe wet lab biologists or bioinformaticians would use it?

sequence • 2.2k views

ADD COMMENT • link updated 6.6 years ago by Biostar 20 • written 13.6 years ago by Eric Fournier ★ 1.4k

score 3 · Answer 1 · 2011-12-08

There are plenty of pipelines developed for analyzing various types of data. If you are talking about a one-stop for every possible type of analysis, Galaxy is probably the closest we have to something like that.

A lot about being a good computational biologist is knowing the technical landscape. Knowing the various software/packages available for a certain type of analysis and being up to date with the bleeding edge analysis techniques. The problem for me with one-stop pipelines is that there is always going to be a trade off between customization and usability. Can I use the newest and greatest software in this pipeline? Probably not until someone integrates it into the pipeline.

I think a pipeline like galaxy is great for when the technology is at a mature stage (microarrays) with years of community experience built up; where data can just be run through a standardized set of analysis modules and you magically get your results. I am not sure if NGS is at the stage yet.

score 2 · Answer 2 · 2011-12-08

My initial reaction is this sounds very much like the pipeline analysis undertaken at a large genome sequencing center, minus such activities as quality checks and gene modeling. Of course, your application of such a pipeline for gene analysis and protein motif discovery, along with SNPs and such will also be on a much smaller scale. Nonetheless, genome centers like Baylor College of Medicine, Washington University and the Wellcome Trust Sanger Institute would be places to look.

score 1 · Answer 3 · 2011-12-08

Although its primary aim is not analyzing sequences, I would like to cite the SADI framework developed by Mark Wilkinson & al . "Using semantic web technologies, the framework describes some web services and can be discovered and utilized in a very intuitive way by biologist end-users". see http://www.ncbi.nlm.nih.gov/pubmed/22024447

A similar idea would be to use the semantic web technologies (like DOAP ) to describe the softwares and their input/output.