Write Script To Combine All Steps Of Bwa
5
6
Entering edit mode
10.5 years ago
Bioscientist ★ 1.7k

For BWA, first do alignment (generate .sai), then sampe the paired-ends (generate .sam), then convert .sam to .bam. Now I'm going to write script to combine all these steps. ie, input is fastq, then output is .bam file.

Now a big question is, I need to guarantee, say, the first step (generating .sai) is COMPLETE (instead of being aborted) before going to the second step; and the second step (generating .sam) before the third. I there any way to check that for BWA? Or what's the sign for the completeness of each step?

thanks!

Edit: Seems BWA return with exit code, which makes this question look stupid. Then the question could be: How can we write script to check if each step of BWA really returns exit code or not?

Edit again:

sorry maybe I don't make clear about my question in the above link. My concern is the quality of the output for each step of BWA. I usually check the error report in the log file. For example, there could be errors like "2.8% bases are trimmed", "cannot infer insert size" "weird pairing".

Then my question is, I want write a script to automatically check the errors for each step , and screen out those files with such error report. And when combine all steps in one script, command 2 will only execute when the output of command 1 has no errors.

thanks!!!

bwa • 6.0k views
0
Entering edit mode

doesn't BWA return exit codes?

0
Entering edit mode

yeah, so you mean exit code will mean the completeness of each step? SO we just do nothing and just wait for the program naturally finish? thx

0
Entering edit mode

Couldn't you use something like ./step1_of_bwa if [ $? != 0 ]; then DO_SOMETHING; else DO_SOMETHING_ELSE; fi ADD REPLY 10 Entering edit mode 10.5 years ago Ketil 4.1k I use a makefile (as mentioned in a related question: A: Run Hundreds Of Bwa Commands Without Waiting Edit: (Unless given an option to override this,) 'make' will stop execution when a command exits with an error code, so this should work okay with bwa. Also you get parallelism for "free". Some excerpts below ($GENOME points to the reference sequence, input files are assumed to be named foo.1.txt and foo.2.txt):

GENIDX   = $(join$(GENOME),.sa)

##################################################
# Index the genome using BWA
# maybe 'bwa index -a bwtsw' for large genomes?
%.sa %.rsa %.rpac %.rbwt %.pac %.bwt %.ann %.amb: %
bwa index -a is $< ################################################## # Align illumina reads %.sai: %.txt$(GENIDX)
bwa aln $(GENOME)$< > $@ %.sam: %.1.sai %.2.sai %.1.txt %.2.txt bwa sampe$(GENOME) $*.1.sai$*.2.sai $*.1.txt$*.2.txt > $@ ################################################## # Align 454 reads using bwasw %.sam: %.fq$(GENIDX)
bwa bwasw $(GENOME)$< > $@ ################################################## # Convert to sorted and indexed BAM file %.bam: %.sam samtools view -S$< -b -u > tmp_$@ samtools sort tmp_$@ $* rm tmp_$@

%.bam.bai: %.bam
samtools index $<  ADD COMMENT 3 Entering edit mode 10.5 years ago Assuming that BWA returns exit codes (and I think it does), then bash has you covered: command1 && command2 && command3  (commands 2 and 3 will only execute if 1 completes successfully) ADD COMMENT 0 Entering edit mode Thanks Chris, but how can I write the command? How can I connect the following commands with &&? I'm really a beginner bwa aln index_prefix file1.fastq > File_1.sai bwa aln index_prefix file2.fastq > File_2.sai bwa sampe index _prefix File_1.sai File_2.sai file1.fastq file2.fastq > Samfile.sam samtools view genome.fai SamFile.sam > BamFile.bam  ADD REPLY 0 Entering edit mode bwa aln index_prefix file1.fastq > File_1.sai && bwa aln index_prefix file2.fastq > File_2.sai && bwa sampe index _prefix File_1.sai File_2.sai file1.fastq file2.fastq > Samfile.sam && samtools view genome.fai SamFile.sam > BamFile.bam  ADD REPLY 2 Entering edit mode 10.5 years ago You might consider Taverna (http://www.taverna.org.uk/) for this. It is a workflow manager for webservice, but they do consider local apps: http://www.taverna.org.uk/documentation/faq/building-workflows/local-perl-and-command-line-scripts/ ADD COMMENT 2 Entering edit mode 10.5 years ago Ketil 4.1k To answer your other question on how to check for errors ( https://www.biostars.org/p/9499/ - the question is closed with a pointer here): I think the individual steps of BWA can be hard to check, as I don't think the intermediate formats (from bwa index and bwa aln) are standardized, nor intended to be processed by other tools than bwa. But I always run 'samtools flagstats' on the final BAM files. This will give you output like: 136758880 in total 0 QC failure 0 duplicates 129945658 mapped (95.02%) 136758880 paired in sequencing 68379440 read1 68379440 read2 117028736 properly paired (85.57%) 127333578 with itself and mate mapped 2612080 singletons (1.91%) 9772328 with mate mapped to a different chr 5172859 with mate mapped to a different chr (mapQ>=5)  This is very useful to check both the sampling and preparation, the sequencing, as well as the mapping. ADD COMMENT 1 Entering edit mode 10.5 years ago I make sure the previous steps have returned successfully with a perl script like this: http://raw.github.com/avilella/hashbrown/master/scripts/bwa.pl perl ~/hashbrown/scripts/bwa.pl -db$ref
-tag mytag
-bwa_exe ~/src/bwa/latest/bwa
-samtools_exe ~/src/samtools/latest/samtools/samtools
-tmpdir /tmp/