Question: How To Learn Pipeline Development?
3
gravatar for biolab
5.1 years ago by
biolab1.1k
biolab1.1k wrote:

Hi everyone, I am new in script programming. I have persisted on learning perl for a couple of months. Now I have been able to write up to ~100 lines, although NOT concise. To complete a task, I usually need a combination of perl scripts, many commands and other softwares. I want to further learn something about pipeline development, which may be very useful for my work. I googled bioinformatics pipline, but could not get much information. Could anyone offer some suggestions on starting pipeline, especially show some examples ? It's also very useful to give some information (websites or books) on pipeline. I will appreciate your advices very much!!

pipeline • 2.8k views
ADD COMMENTlink modified 5.1 years ago by Ashutosh Pandey11k • written 5.1 years ago by biolab1.1k
5
gravatar for Pierre Lindenbaum
5.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

learn make:

here are 3 examples I gave to my students ( http://www.slideshare.net/lindenb/make-16134373 ). They all do the same job.

enter image description here

TRANSCRIPT=cat  # a tool that would convert a DNA to RNA stdin
TRANSLATE=cat # a tool that would translate a DNA from stdin
merged.protein: file1.pep file2.pep file3.pep
    cat file1.pep file2.pep \
        file3.pep > merged.protein

file1.pep: file1.rna
     ${TRANSLATE} file1.rna > file1.pep

file1.rna : file1.dna
    ${TRANSCRIPT} file1.dna > file1.rna

file1.dna:
    echo "ATGCTAGTAGATGC" > file1.dna

file2.pep: file2.rna
     ${TRANSLATE} file2.rna > file2.pep

file2.rna : file2.dna
    ${TRANSCRIPT} file2.dna > file2.rna

file2.dna:
    echo "ATGCTAGTAGATGC" > file2.dna


file3.pep: file3.rna
     ${TRANSLATE} file3.rna > file3.pep

file3.rna : file3.dna
    ${TRANSCRIPT} file3.dna > file3.rna

file3.dna:
    echo "ATGCTAGTAGATGC" > file3.dna

... a second example

TRANSCRIPT=cat
TRANSLATE=cat

%.pep:%.rna
    ${TRANSLATE} $< > $@
%.rna:%.dna
    ${TRANSCRIPT} $< > $@

merged.protein: file1.pep file2.pep file3.pep
    cat $^ > $@

file1.dna:
    echo "ATGCTAGTAGATGC" > $@
file2.dna:
    echo "ATGCTAGTAGATGC" > $@
file3.dna:
    echo "ATGCTAGTAGATGC" > $@

and a 3rd example:

TRANSCRIPT=cat
TRANSLATE=cat
INDEXES=1 2 3
%.pep:%.rna
    ${TRANSLATE} $< > $@
%.rna:%.dna
    ${TRANSCRIPT} $< > $@

merged.protein: $(foreach INDEX,${INDEXES},file${INDEX}.pep )
    cat $^ > $@

$(foreach INDEX,${INDEXES},$(eval \
file${INDEX}:\
    echo "ATGCTAGTAGATGC" > $$@ \
))
ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Pierre Lindenbaum119k

Thank you very much! It's really helpful!

ADD REPLYlink written 5.1 years ago by biolab1.1k
2
gravatar for QVINTVS_FABIVS_MAXIMVS
5.1 years ago by
USA SoCal
QVINTVS_FABIVS_MAXIMVS2.2k wrote:

It is possible to write a shell script that runs your Perl scripts as a pipeline. For example, you have NGS data that needs to be trimmed, filtered, and mapped.

#BEGIN BASH SCRIPT
perl trimmer.pl data.fastq
perl filter.pl    data_trim.fastq
perl map.pl   data_trim_filt.fastq

And then to run it you type this in your terminal

bash perl_pipeline.sh

I started using a cluster to analyze my data and I find it useful to write shell scripts with dependencies in order to run hundreds of scripts in parallel. It's a powerful feeling.

Hope you liked this simple example!

Edit:

If you're really lazy like me, a good facet of a computational scientist, you'll write a Perl script to write the bash script. You can load the names of the files from their directory in the script and then write out commands in a loop like

foreach(@files){
    print OUT "perl perl_script.pl $_\n";
}

Have fun!

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by QVINTVS_FABIVS_MAXIMVS2.2k
1

and that's why you need something like SGE+qmake to run your independent analysis in parallel. You can hardly parallelize things with a simple bash script.

ADD REPLYlink written 5.1 years ago by Pierre Lindenbaum119k

Thanks for the info!

ADD REPLYlink written 5.1 years ago by QVINTVS_FABIVS_MAXIMVS2.2k
1
gravatar for Martin A Hansen
5.1 years ago by
Martin A Hansen3.0k
Denmark
Martin A Hansen3.0k wrote:

Have a look at Biopieces -> www.biopieces.org

ADD COMMENTlink written 5.1 years ago by Martin A Hansen3.0k
1
gravatar for Ashutosh Pandey
5.1 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

This post may give you some idea about existing pipeline building framework:

C: Which bioinformatic friendly pipeline building framework?

ADD COMMENTlink written 5.1 years ago by Ashutosh Pandey11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1772 users visited in the last hour