Forum: Biopieces Is A Bioinformatic Framework Of Tools Easily Used And Easily Created.
11
gravatar for Martin A Hansen
6.3 years ago by
Martin A Hansen3.0k
Denmark
Martin A Hansen3.0k wrote:

www.biopieces.org

The Biopieces are a collection of bioinformatics tools that can be pieced together in a very easy and flexible manner to perform both simple and complex tasks. The Biopieces work on a data stream in such a way that the data stream can be passed through several different Biopieces, each performing one specific task: modifying or adding records to the data stream, creating plots, or uploading data to databases and web services. The Biopieces are executed in a command line environment where the data stream is initialized by specific Biopieces which read data from files, databases, or web services, and output records to the data stream that is passed to downstream Biopieces until the data stream is terminated at the end of the analysis as outlined below:

read_data | calculate_something | write_results

The following example demonstrates how a next generation sequencing experiment can be cleaned and analyzed – including plotting of scores and length distribution, removal of adaptor sequence, trimming and filtering using quality scores, mapping to a specified genome, and uploading the data to the UCSC genome browser for further analysis:

read_fastq -i data.fq |                               #  Initialize data stream from a FASTQ file.
plot_scores -t png -o scores_unclean.png |            #  Plot scores before cleaning. 
find_adaptor -c 24 -a TCGTATGCCGTCTTC -p |            #  Locate adaptor - including partial adaptor.
clip_adaptor |                                        #  Clip any located adaptor.
trim_seq |                                            #  End trim sequences according to quality scores.
grab -e 'SEQ_LEN > 18'                                #  Filter short sequences.
mean_scores -l |                                      #  Locate local quality score minima.
grab -e 'SCORES_MEAN_LOCAL >= 15' |                   #  Filter low local quality score minima.
write_fastq -o data_clean.fq |                        #  Write the cleaned data to a FASTQ file.
plot_scores -t png -o scores_clean.png |              #  Plot scores after cleaning. 
plot_distribution -k SEQ_LEN -t png -o lengths.png |  #  Plot sequence length distribution.  
bowtie_seq -c 24 -g hg19 -m 2 |                       #  Map sequences to the human genome with Bowtie.
upload_to_ucsc –d hg19 –t my_data –x                  #  Upload the results to the UCSC Genome Browser.

The advantage of the Biopieces is that a user can easily solve simple and complex tasks without having any programming experience. Moreover, since the data format used to pass data between Biopieces is text based, different developers can quickly create new Biopieces in their favorite programming language - and all the Biopieces will maintain compatibility. Finally, templates exist for creating new Biopieces in Perl and Ruby.

There are currently ~190 Biopieces.

Biopieces was developed with support from the Danish Agency for Science, Technology and Innovation (grant no 272-06-0325).

forum • 3.0k views
ADD COMMENTlink written 6.3 years ago by Martin A Hansen3.0k
1

If things are truly being passed strictly through pipes, this approach seems horribly susceptible to failures. If your plotting step at the end of a long sequence of steps fails, do you have to remap your entire WGS experiment?

ADD REPLYlink written 6.3 years ago by Chris Miller21k
2

@Chris, in real life we may divide the pipeline into parts for each heavy lifting step (e.g. mapping). Also, we run a few sequences through the pipe as a first test to check for syntax errors, etc. Biopieces has worked great for me and many others since 2007.

ADD REPLYlink modified 6.3 years ago • written 6.3 years ago by Martin A Hansen3.0k
1

As Martin A Hansen points out one should factor long running processes into standalone scripts. Where the pipes shine are in reducing the endless propagation of intermediate files - the mental overhead of handling/managing those can be a major bottleneck.

ADD REPLYlink modified 6.3 years ago • written 6.3 years ago by Istvan Albert ♦♦ 81k

Makes sense. Thanks for clarifying.

ADD REPLYlink written 6.3 years ago by Chris Miller21k
1

Very nice tool, inspired to the Unix philosophy, and which reminds me of EMBOSS. Do you also have a favourite way to define pipelines - e.g. an equivalent of Makefiles for Biopieces?

ADD REPLYlink written 6.3 years ago by Giovanni M Dall'Olio26k

I favor simple bash scripts: https://code.google.com/p/biopieces/wiki/HowTo#Howto_script_Biopieces

ADD REPLYlink written 6.3 years ago by Martin A Hansen3.0k

Duplicated The Biopieces are a collection of bioinformatics tools

ADD REPLYlink written 6.3 years ago by Medhat8.5k

Well this is an Ad the other just a tool description.

ADD REPLYlink written 6.3 years ago by Istvan Albert ♦♦ 81k

sorry for misunderstand

ADD REPLYlink written 6.3 years ago by Medhat8.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 805 users visited in the last hour