Heys, I'm working with two genomes from a whole-genome sequencing study. The coverage is 10x and 8x respectively and i'm aligning them to the reference genome of a sister species. This analysis was already carried out by a previous colleague of the lab, and I would like to obtain the same results as him, as I need to map other samples. He told me he used bwa version 0.7.7 (which I assume is incorrect, missing a 1 before the last 7) with these parameters:
-l 16500 -n 0.01 -o 2
From this and bwa's website (http://bio-bwa.sourceforge.net/bwa.shtml) I assume he used the aln logarythm, as mem does not have these commands.
As I always did alignments with mem algorythm, I run it and the size of the two BAM files obtained were 10 times bigger than his, which seems to me a huge difference...
I'm approaching this the right way? In order to repeat his analyses I'm running:
bwa aln -l 16500 -n 0.01 -o 2 -t 10 $GEN $fastq1 $fastq2 -o $outdir/genome1.bam
Where GEN is the reference genome and fastq1 and fastq2 my paired data of my first genome.
Thanks in advance for any input!
Please read the manual of bwa.
alndoes not produce bam output. It produces an intermediate output that has to be processed withsampe.sampecan then output sam, and this you can convert to bam withsamtools. If you want exact results then ask the colleague for a script. You also use-otwice, and-ois not an argument to specify the output file.bwawrites to stdout so capture it withaln (...) > out.saiThanks for your reply. As I'm used to work with mem I did not completely check the aln algorythm. Do you agree that I should use the aln algorythm if he told me he used these parameters?
-l 16500 -n 0.01 -o 2If you are unsure about previous work, why don't you simply realign all relevant data with a tool that you are confident with and can be sure how output was produced. I never trust analysis when exact code is not available. Who knows what others have done, and if I build on existing results there must be code to check how they were produced, at least this is how I'd do it.
Thanks for your messages! The colleague is not in the lab anymore and there are no means to contact him. Thanks anyway!
If your colleague did not use
memwith a recent version ofbwathen you should perhaps see this: A: When and why is bwa aln better then bwa mem?If you must repeat what they did they ask your colleague for the exact commands used. If your reads are shorter than 50 bp then
alnmay be slightly better.That make sense, the reads are 35 bp!
If I remember correctly, bwa aln needs to be followed by bwa sampe (for paired) bwa samse (for single-end). I assume, the bam files you have, are actually sai files. Furthermore, your command doesn't make sense as you have the
-oswitch twice, once for gap open (the right switch for bwa aln) and once for output, which doesn't exist in bwa aln.Adapted from the bwa documentation for your case it should be:
Edit: Just for the record, the other comments seemed to have come in while I was writing.
thanks, really nice answer! I thought the commands from aln were the same as mem, I'll check that before asking next time. Thanks again!