How to generate a BAM/SAM file to show a mutation using BWA aligner
1
0
Entering edit mode
5.5 years ago
neranjan ▴ 60

Hi,

I am trying to generate Single nucleotide mutations to a sequence, and trying to get SAM files and its related CIGAR sequences.

My test genome is: (ref.fa)

>genome
CGGACACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATA
GATGGGCCCATCGATTGCCATCCCGGGGAAAATCATCATCATCATCATCA

Then I created a fastq sequence (seq_1.fq) which I inserted a Nucleotide A, to the first line and ran the BWA alignment, then the SAM file

bwa mem ref.fa seq_1.fq >  seq_1.sam

samtools sort -o seq_1.bam --output-fmt BAM seq_1.sam

samtools view seq_1.bam | sam2pairwise

which shows the insertion by giving the CIGAR seq as 29M1I21M:

    genome  0   genome  1   60  29M1I21M    *   0   0
CGGACACACAAAAAGAAAGAAGAATTTTTAAGGATCTTTTGTGTGCGAATA
||||||||||||||||||||||||||||| |||||||||||||||||||||
CGGACACACAAAAAGAAAGAAGAATTTTT-AGGATCTTTTGTGTGCGAATA

Question: But when I tried to replace a T using G from 'TTTTT' to 'TTGTT' (seq_2.fq) and ran the BWA align and samtools it does not give me the desired CIGAR sequence !!! BUT it just gives it as 50M

 bwa mem ref.fa seq_2.fq >  seq_2.sam 

samtools sort -o seq_2.bam--output-fmt BAM seq_2.sam 

samtools view seq_2.bam | sam2pairwise

genome  0   genome  1   60  50M *   0   0
CGGACACACAAAAAGAAAGAAGAATTGTTAGGATCTTTTGTGTGCGAATA
||||||||||||||||||||||||||||||||||||||||||||||||||
CGGACACACAAAAAGAAAGAAGAATTGTTAGGATCTTTTGTGTGCGAATA

as it does not recognize the mutation which was introduced to the sequence.

So am missing something here ? How can I generate a CIGAR sequence which states, that there is a mismatch or SNP in the sequence ?

Thank you very much.

SNP alignment bwa • 1.8k views
ADD COMMENT
5
Entering edit mode
5.5 years ago

So am missing something here ?

yes. The 'M' operator doesn't mean that there is a perfect match. It means that the two sequences are best aligned if the read and the ref are aligned at this position. You want the derived symbols of 'M' which are '=' (match) and 'X' (mismatch) . See the SAM specification.

see also: the NM tag (edit distance) and How to convert from a cigar string to extended cigar string?

ADD COMMENT

Login before adding your answer.

Traffic: 1278 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6