Question: How to generate a BAM/SAM file to show a mutation using BWA aligner
0
gravatar for neranjan
4 months ago by
neranjan0
US
neranjan0 wrote:

Hi,

I am trying to generate Single nucleotide mutations to a sequence, and trying to get SAM files and its related CIGAR sequences.

My test genome is: (ref.fa)

>genome
CGGACACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATA
GATGGGCCCATCGATTGCCATCCCGGGGAAAATCATCATCATCATCATCA

Then I created a fastq sequence (seq_1.fq) which I inserted a Nucleotide A, to the first line and ran the BWA alignment, then the SAM file

bwa mem ref.fa seq_1.fq >  seq_1.sam

samtools sort -o seq_1.bam --output-fmt BAM seq_1.sam

samtools view seq_1.bam | sam2pairwise

which shows the insertion by giving the CIGAR seq as 29M1I21M:

    genome  0   genome  1   60  29M1I21M    *   0   0
CGGACACACAAAAAGAAAGAAGAATTTTTAAGGATCTTTTGTGTGCGAATA
||||||||||||||||||||||||||||| |||||||||||||||||||||
CGGACACACAAAAAGAAAGAAGAATTTTT-AGGATCTTTTGTGTGCGAATA

Question: But when I tried to replace a T using G from 'TTTTT' to 'TTGTT' (seq_2.fq) and ran the BWA align and samtools it does not give me the desired CIGAR sequence !!! BUT it just gives it as 50M

 bwa mem ref.fa seq_2.fq >  seq_2.sam 

samtools sort -o seq_2.bam--output-fmt BAM seq_2.sam 

samtools view seq_2.bam | sam2pairwise

genome  0   genome  1   60  50M *   0   0
CGGACACACAAAAAGAAAGAAGAATTGTTAGGATCTTTTGTGTGCGAATA
||||||||||||||||||||||||||||||||||||||||||||||||||
CGGACACACAAAAAGAAAGAAGAATTGTTAGGATCTTTTGTGTGCGAATA

as it does not recognize the mutation which was introduced to the sequence.

So am missing something here ? How can I generate a CIGAR sequence which states, that there is a mismatch or SNP in the sequence ?

Thank you very much.

bwa snp alignment • 231 views
ADD COMMENTlink modified 4 months ago by h.mon23k • written 4 months ago by neranjan0
4
gravatar for Pierre Lindenbaum
4 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:

So am missing something here ?

yes. The 'M' operator doesn't mean that there is a perfect match. It means that the two sequences are best aligned if the read and the ref are aligned at this position. You want the derived symbols of 'M' which are '=' (match) and 'X' (mismatch) . See the SAM specification.

see also: the NM tag (edit distance) and How to convert from a cigar string to extended cigar string?

ADD COMMENTlink written 4 months ago by Pierre Lindenbaum116k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1313 users visited in the last hour