My data is 100bpx2 HiSeq PE reads, some PE overlap so I merged them into SE. For my analysis I used the aln algorithm: #bwa aln -t 8 -n 0.1 GDR45_Spades_k85_conc.fasta GDR18_trimmed_SE.fastq > GDR18_0.sai #bwa aln -t 8 -n 0.1 GDR45_Spades_k85_conc.fasta GDR18_trimmed_1.fastq > GDR18_1.sai #bwa aln -t 8 -n 0.1 GDR45_Spades_k85_conc.fasta GDR18_trimmed_2.fastq > GDR18_2.sai #bwa sampe GDR45_Spades_k85_conc.fasta GDR18_1.sai GDR18_2.sai GDR18_trimmed_1.fastq GDR18_trimmed_2.fastq > GDR-18_pe.sam #bwa samse GDR45_Spades_k85_conc.fasta GDR18_0.sai GDR18_trimmed_SE.fastq > GDR-18_se.sam #samtools view -Sb GDR-18_pe.sam > GDR-18_pe.bam #samtools view -Sb GDR-18_se.sam > GDR-18_se.bam #samtools merge GDR-18.bam GDR-18_pe.bam GDR-18_se.bam
Now I am wondering what are the advantages of switching to the bwa mem algoritm? One reason I need to use it is because it has the "-a" option and I would need to look for SVs later on.
So now I would just issue: bwa mem -a -t 8 GDR45_Spades_k85_conc.fasta ../reads/GDR18_trimmed_1.fastq ../reads/GDR18_trimmed_2.fastq > GDR-18_pe.sam bwa mem -a -t 8 GDR45_Spades_k85_conc.fasta ../reads/GDR18_trimmed_SE.fastq > GDR-18_se.sam samtools merge GDR-18.bam GDR-18_pe.bam GDR-18_se.bam
It was pretty important to allow some mismatch in the aln algorithm, so i set -n to 0.1, what would be the equivalent here?
The purpose of this is to find variation in a fairly polymorphic genome (Watterson's Theta 0.0041)