Write Script To Combine All Steps Of Bwa
5
6
Entering edit mode
12.8 years ago
Bioscientist ★ 1.7k

For BWA, first do alignment (generate .sai), then sampe the paired-ends (generate .sam), then convert .sam to .bam. Now I'm going to write script to combine all these steps. ie, input is fastq, then output is .bam file.

Now a big question is, I need to guarantee, say, the first step (generating .sai) is COMPLETE (instead of being aborted) before going to the second step; and the second step (generating .sam) before the third. I there any way to check that for BWA? Or what's the sign for the completeness of each step?

thanks!

Edit: Seems BWA return with exit code, which makes this question look stupid. Then the question could be: How can we write script to check if each step of BWA really returns exit code or not?

Edit again:

sorry maybe I don't make clear about my question in the above link. My concern is the quality of the output for each step of BWA. I usually check the error report in the log file. For example, there could be errors like "2.8% bases are trimmed", "cannot infer insert size" "weird pairing".

Then my question is, I want write a script to automatically check the errors for each step , and screen out those files with such error report. And when combine all steps in one script, command 2 will only execute when the output of command 1 has no errors.

thanks!!!

bwa • 7.7k views
ADD COMMENT
0
Entering edit mode

doesn't BWA return exit codes?

ADD REPLY
0
Entering edit mode

yeah, so you mean exit code will mean the completeness of each step? SO we just do nothing and just wait for the program naturally finish? thx

ADD REPLY
0
Entering edit mode

Couldn't you use something like ./step1_of_bwa if [ $? != 0 ]; then DO_SOMETHING; else DO_SOMETHING_ELSE; fi

ADD REPLY
10
Entering edit mode
12.8 years ago
Ketil 4.1k

I use a makefile (as mentioned in a related question: A: Run Hundreds Of Bwa Commands Without Waiting

Edit: (Unless given an option to override this,) 'make' will stop execution when a command exits with an error code, so this should work okay with bwa. Also you get parallelism for "free".

Some excerpts below ($GENOME points to the reference sequence, input files are assumed to be named foo.1.txt and foo.2.txt):

GENIDX   = $(join $(GENOME),.sa)

##################################################
# Index the genome using BWA
# maybe 'bwa index -a bwtsw' for large genomes?
%.sa %.rsa %.rpac %.rbwt %.pac %.bwt %.ann %.amb: %
        bwa index -a is $<

##################################################
# Align illumina reads
%.sai: %.txt $(GENIDX)
        bwa aln $(GENOME) $< > $@

%.sam: %.1.sai %.2.sai %.1.txt %.2.txt 
        bwa sampe $(GENOME) $*.1.sai $*.2.sai $*.1.txt $*.2.txt > $@

##################################################
# Align 454 reads using bwasw

%.sam: %.fq $(GENIDX)
        bwa bwasw $(GENOME) $< > $@

##################################################
# Convert to sorted and indexed BAM file

%.bam: %.sam
        samtools view -S $< -b -u > tmp_$@
        samtools sort tmp_$@ $*
        rm tmp_$@

%.bam.bai: %.bam
        samtools index $<
ADD COMMENT
3
Entering edit mode
12.8 years ago

Assuming that BWA returns exit codes (and I think it does), then bash has you covered:

command1 && command2 && command3

(commands 2 and 3 will only execute if 1 completes successfully)

ADD COMMENT
0
Entering edit mode

Thanks Chris, but how can I write the command? How can I connect the following commands with &&? I'm really a beginner

bwa aln index_prefix file1.fastq > File_1.sai
bwa aln index_prefix file2.fastq > File_2.sai
bwa sampe index _prefix File_1.sai File_2.sai  file1.fastq  file2.fastq > Samfile.sam
samtools  view genome.fai SamFile.sam  > BamFile.bam
ADD REPLY
0
Entering edit mode
bwa aln index_prefix file1.fastq > File_1.sai && bwa aln index_prefix file2.fastq > File_2.sai && bwa sampe index _prefix File_1.sai File_2.sai file1.fastq file2.fastq > Samfile.sam && samtools view genome.fai SamFile.sam > BamFile.bam
ADD REPLY
2
Entering edit mode
12.8 years ago

You might consider Taverna (http://www.taverna.org.uk/) for this. It is a workflow manager for webservice, but they do consider local apps:

http://www.taverna.org.uk/documentation/faq/building-workflows/local-perl-and-command-line-scripts/

ADD COMMENT
2
Entering edit mode
12.8 years ago
Ketil 4.1k

To answer your other question on how to check for errors ( https://www.biostars.org/p/9499/ - the question is closed with a pointer here):

I think the individual steps of BWA can be hard to check, as I don't think the intermediate formats (from bwa index and bwa aln) are standardized, nor intended to be processed by other tools than bwa.

But I always run 'samtools flagstats' on the final BAM files. This will give you output like:

136758880 in total
0 QC failure
0 duplicates
129945658 mapped (95.02%)
136758880 paired in sequencing
68379440 read1
68379440 read2
117028736 properly paired (85.57%)
127333578 with itself and mate mapped
2612080 singletons (1.91%)
9772328 with mate mapped to a different chr
5172859 with mate mapped to a different chr (mapQ>=5)

This is very useful to check both the sampling and preparation, the sequencing, as well as the mapping.

ADD COMMENT
1
Entering edit mode
12.8 years ago

I make sure the previous steps have returned successfully with a perl script like this: http://raw.github.com/avilella/hashbrown/master/scripts/bwa.pl

perl ~/hashbrown/scripts/bwa.pl 
    -db $ref 
    -reads $readset 
    -tag mytag 
    -bwa_exe ~/src/bwa/latest/bwa 
    -samtools_exe ~/src/samtools/latest/samtools/samtools 
    -tmpdir /tmp/
ADD COMMENT

Login before adding your answer.

Traffic: 2049 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6