Question: Bwa alignment problem of single end DNA reads
gravatar for akhan
4.7 years ago by
United States
akhan0 wrote:


I am trying to align Genomic DNA sequence reads (64 bases in length) to reference human genome. The sequence reads are single ended. I am using bwa to align. My question is whether is it possible to align sequence reads by giving  a parameter option to zero mismatch. I want to extract reads those are perfectly matches to reference genome. I have tried with following command line, it generates bam files with no errors with zero reads for  zero mismatch criteria. Also I have tried to use fraction, but in these case I get more reads mapped to reference genome than when I used more relaxed criteria such as allowing maximum of 4 mismatches. If I understand correctly, more stringent criteria will generate less number of reads mapped than relaxed criteria.  I would appreciate any help with this issue explaining the parameter, what it does and why I don't get mapped reads for zero mismatch criteria. Thanks for your help.


command for Using zero mismatch parameter

bwa aln -n 0 ./hg38bwaidx ./merged_w1_0_nM.fastq.gz >  zero_nM-0.txt.bwa
bwa samse ./hg38bwaidx zero_nM-0.txt.bwa ./merged_w1_0_nM.fastq.gz > zero_nM-0.txt.sam
samtools view -b -S -o zero_nM-0.bam zero_nM-0.txt.sam

command for using fraction parameter(presumably should be 0.1% mismatch of the reads of 64 bases are allowed )

bwa aln -n 0.001 ./hg38bwaidx ./merged_w1_0_nM.fastq.gz >  zero_nM-0.001.txt.bwa
bwa samse ./hg38bwaidx zero_nM-0.001.txt.bwa ./merged_w1_0_nM.fastq.gz > zero_nM-0.001.txt.sam
samtools view -b -S -o zero_nM-one.bam zero_nM-0.001.txt.sam


bwa alignment • 2.2k views
ADD COMMENTlink modified 4.7 years ago by dariober11k • written 4.7 years ago by akhan0
gravatar for Brian Bushnell
4.7 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

BBMap can do this with the "perfectmode" flag: ref=hg38.fasta in=reads.fq out=perfect.sam perfectmode

Alternately, "semiperfectmode" will allow reads that map perfectly but extend off the end of contigs (in other words, they overlap Ns in the reference).  If you give a suffix of .fastq to the output file the reads will be output in fastq instead of sam.

ADD COMMENTlink written 4.7 years ago by Brian Bushnell17k
gravatar for dariober
4.7 years ago by
WCIP | Glasgow | UK
dariober11k wrote:

You could parse the output SAM file to keep only the header lines and the records containing the tag "NM:i:0" (edit distance of 0):

awk '$0 ~ "^@" || $0 ~ "\tNM:i:0"' aligned.sam | samtools view ...

(Anyway, why doesn't bwa aln -n 0 ... suite you?)

However, are you sure you want to keep perfect match reads? Depending on your experiment you might bias in favour of the reference sequence and against genuine SNPs.

ADD COMMENTlink written 4.7 years ago by dariober11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 806 users visited in the last hour