Question: Write Script To Combine All Steps Of Bwa
6
gravatar for Bioscientist
8.3 years ago by
Bioscientist1.7k
Bioscientist1.7k wrote:

For BWA, first do alignment (generate .sai), then sampe the paired-ends (generate .sam), then convert .sam to .bam. Now I'm going to write script to combine all these steps. ie, input is fastq, then output is .bam file.

Now a big question is, I need to guarantee, say, the first step (generating .sai) is COMPLETE (instead of being aborted) before going to the second step; and the second step (generating .sam) before the third. I there any way to check that for BWA? Or what's the sign for the completeness of each step?

thanks!

Edit: Seems BWA return with exit code, which makes this question look stupid. Then the question could be: How can we write script to check if each step of BWA really returns exit code or not?

Edit again:

sorry maybe I don't make clear about my question in the above link. My concern is the quality of the output for each step of BWA. I usually check the error report in the log file. For example, there could be errors like "2.8% bases are trimmed", "cannot infer insert size" "weird pairing".

Then my question is, I want write a script to automatically check the errors for each step , and screen out those files with such error report. And when combine all steps in one script, command 2 will only execute when the output of command 1 has no errors.

thanks!!!

bwa • 5.0k views
ADD COMMENTlink modified 8.3 years ago by Ketil4.0k • written 8.3 years ago by Bioscientist1.7k

doesn't BWA return exit codes?

ADD REPLYlink written 8.3 years ago by Russh1.2k

yeah, so you mean exit code will mean the completeness of each step? SO we just do nothing and just wait for the program naturally finish? thx

ADD REPLYlink written 8.3 years ago by Bioscientist1.7k

Couldn't you use something like ./step1_of_bwa if [ $? != 0 ]; then DO_SOMETHING; else DO_SOMETHING_ELSE; fi

ADD REPLYlink written 8.3 years ago by Russh1.2k
10
gravatar for Ketil
8.3 years ago by
Ketil4.0k
Germany
Ketil4.0k wrote:

I use a makefile (as mentioned in a related question: A: Run Hundreds Of Bwa Commands Without Waiting

Edit: (Unless given an option to override this,) 'make' will stop execution when a command exits with an error code, so this should work okay with bwa. Also you get parallelism for "free".

Some excerpts below ($GENOME points to the reference sequence, input files are assumed to be named foo.1.txt and foo.2.txt):

GENIDX   = $(join $(GENOME),.sa)

##################################################
# Index the genome using BWA
# maybe 'bwa index -a bwtsw' for large genomes?
%.sa %.rsa %.rpac %.rbwt %.pac %.bwt %.ann %.amb: %
        bwa index -a is $<

##################################################
# Align illumina reads
%.sai: %.txt $(GENIDX)
        bwa aln $(GENOME) $< > $@

%.sam: %.1.sai %.2.sai %.1.txt %.2.txt 
        bwa sampe $(GENOME) $*.1.sai $*.2.sai $*.1.txt $*.2.txt > $@

##################################################
# Align 454 reads using bwasw

%.sam: %.fq $(GENIDX)
        bwa bwasw $(GENOME) $< > $@

##################################################
# Convert to sorted and indexed BAM file

%.bam: %.sam
        samtools view -S $< -b -u > tmp_$@
        samtools sort tmp_$@ $*
        rm tmp_$@

%.bam.bai: %.bam
        samtools index $<
ADD COMMENTlink modified 6 days ago by RamRS24k • written 8.3 years ago by Ketil4.0k
3
gravatar for Chris Miller
8.3 years ago by
Chris Miller21k
Washington University in St. Louis, MO
Chris Miller21k wrote:

Assuming that BWA returns exit codes (and I think it does), then bash has you covered:

command1 && command2 && command3

(commands 2 and 3 will only execute if 1 completes successfully)

ADD COMMENTlink written 8.3 years ago by Chris Miller21k

Thanks Chris, but how can I write the command? How can I connect the following commands with &&? I'm really a beginner

bwa aln index_prefix file1.fastq > File_1.sai
bwa aln index_prefix file2.fastq > File_2.sai
bwa sampe index _prefix File_1.sai File_2.sai  file1.fastq  file2.fastq > Samfile.sam
samtools  view genome.fai SamFile.sam  > BamFile.bam
ADD REPLYlink modified 6 days ago by RamRS24k • written 8.3 years ago by Bioscientist1.7k
bwa aln index_prefix file1.fastq > File_1.sai && bwa aln index_prefix file2.fastq > File_2.sai && bwa sampe index _prefix File_1.sai File_2.sai file1.fastq file2.fastq > Samfile.sam && samtools view genome.fai SamFile.sam > BamFile.bam
ADD REPLYlink modified 6 days ago by RamRS24k • written 8.3 years ago by Chris Miller21k
2
gravatar for Andra Waagmeester
8.3 years ago by
Maastricht, the Netherlands
Andra Waagmeester3.2k wrote:

You might consider Taverna (http://www.taverna.org.uk/) for this. It is a workflow manager for webservice, but they do consider local apps:

http://www.taverna.org.uk/documentation/faq/building-workflows/local-perl-and-command-line-scripts/

ADD COMMENTlink modified 6 days ago by RamRS24k • written 8.3 years ago by Andra Waagmeester3.2k
2
gravatar for Ketil
8.3 years ago by
Ketil4.0k
Germany
Ketil4.0k wrote:

To answer your other question on how to check for errors ( https://www.biostars.org/p/9499/ - the question is closed with a pointer here):

I think the individual steps of BWA can be hard to check, as I don't think the intermediate formats (from bwa index and bwa aln) are standardized, nor intended to be processed by other tools than bwa.

But I always run 'samtools flagstats' on the final BAM files. This will give you output like:

136758880 in total
0 QC failure
0 duplicates
129945658 mapped (95.02%)
136758880 paired in sequencing
68379440 read1
68379440 read2
117028736 properly paired (85.57%)
127333578 with itself and mate mapped
2612080 singletons (1.91%)
9772328 with mate mapped to a different chr
5172859 with mate mapped to a different chr (mapQ>=5)

This is very useful to check both the sampling and preparation, the sequencing, as well as the mapping.

ADD COMMENTlink modified 6 days ago by RamRS24k • written 8.3 years ago by Ketil4.0k
1
gravatar for 2184687-1231-83-
8.3 years ago by
2184687-1231-83-5.0k wrote:

I make sure the previous steps have returned successfully with a perl script like this: http://raw.github.com/avilella/hashbrown/master/scripts/bwa.pl

perl ~/hashbrown/scripts/bwa.pl 
    -db $ref 
    -reads $readset 
    -tag mytag 
    -bwa_exe ~/src/bwa/latest/bwa 
    -samtools_exe ~/src/samtools/latest/samtools/samtools 
    -tmpdir /tmp/
ADD COMMENTlink modified 6 days ago by RamRS24k • written 8.3 years ago by 2184687-1231-83-5.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1511 users visited in the last hour