Question: How to get cigar string from bwa or bowtie
0
gravatar for juan.crescente
3 months ago by
juan.crescente20 wrote:

I have two sequenes and I want to get the CIGAR string of a local alignment. I've tried with bwa and bowtie

The first one give me this, all *. When I change a nucleotide in the query, all it does is change the sequence reported in the output (8th column), but nothing else!

$ ./bwa mem   ../seq1.fa ../seq2.fa 
[M::bwa_idx_load_from_disk] read 0 ALT contigs
@SQ SN:subject  LN:1141
@PG ID:bwa  PN:bwa  VN:0.7.17-r1198-dirty   CL:./bwa mem ../seq1.fa ../seq2.fa
[M::process] read 1 sequences (13 bp)...
[M::mem_process_seqs] Processed 1 reads in 0.000 CPU sec, 0.000 real sec
subject 4   *   0   0   *   *   0   0   AGCAAAGCAAGTT   *   AS:i:0  XS:i:0
[main] Version: 0.7.17-r1198-dirty
[main] CMD: ./bwa mem ../seq1.fa ../seq2.fa
[main] Real time: 0.002 sec; CPU: 0.006 sec

With bowtie, if I change a base in the query it gets me the same CIGAR (12M). Doesn't seems to be logic to me.

$ bowtie  seq1.fa.fai -f  seq2.fa  --sam
@HD VN:1.0  SO:unsorted
@SQ SN:subject  LN:1141
@PG ID:Bowtie   VN:1.2.2    CL:"/Users/juan/Documents/manu/dev/bioinfoch/2/E_TEs/bowtie-1.2.2-macos-x86_64/bowtie-align-s --wrapper basic-0 seq1.fa.fai -f seq2.fa --sam"
subject 0   subject 331 255 12M *   0   0   AGCAAAGCAAGT    IIIIIIIIIIII    XA:i:0  MD:Z:12 NM:i:0  XM:i:2
# reads processed: 1
# reads with at least one reported alignment: 1 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 1 alignments
➜  E_TEs 
$ bowtie  seq1.fa.fai -f  seq2.fa  --sam
@HD VN:1.0  SO:unsorted
@SQ SN:subject  LN:1141
@PG ID:Bowtie   VN:1.2.2    CL:"/Users/juan/Documents/manu/dev/bioinfoch/2/E_TEs/bowtie-1.2.2-macos-x86_64/bowtie-align-s --wrapper basic-0 seq1.fa.fai -f seq2.fa --sam"
subject 0   subject 331 255 12M *   0   0   AGCAAAGTAAGT    IIIIIIIIIIII    XA:i:1  MD:Z:7C4    NM:i:1  XM:i:2
# reads processed: 1
# reads with at least one reported alignment: 1 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 1 alignments
bowtie • 177 views
ADD COMMENTlink modified 3 months ago by Santosh Anand4.7k • written 3 months ago by juan.crescente20
1

The sequence is to short to work with bwa mem.

Further more a M in a CIGAR String means alignment match. This can be a match or a mismatch to the reference. The MD tag gives you the information whether it is a match or a mismatch to the reference.

There are also tools that can convert "normal" CIGAR strings to extended ones. See How to convert from a cigar string to extended cigar string?

ADD REPLYlink modified 3 months ago • written 3 months ago by finswimmer11k

first read has sam flag = 4 your read is unmapped, no cigar string.

ADD REPLYlink written 3 months ago by Pierre Lindenbaum120k

I see, with bwa mem. What about the CIGAR in bowtie?

ADD REPLYlink written 3 months ago by juan.crescente20
0
gravatar for Santosh Anand
3 months ago by
Santosh Anand4.7k
Santosh Anand4.7k wrote:

M means that the nucleotide maps to the REF. Maps here means that there are same number of nucleotide in the REF (i.e. No INDELs), but they can be either match or mismatch

Bowtie: Here 12M means the "map" is over 12 nucleotide sequence (i.e. without INDELs), but it can be either match or mismatch.

If you need to see the differences, see the MD tag MD:Z:7C4 (7 Match, one C mismatch and then 4 match)

BWA: BWA is not able to find any match. Note that bwa is looking for full length match of 13bp (vs bowtie of 12bp).

ADD COMMENTlink written 3 months ago by Santosh Anand4.7k

is there a way to get a string represeting the alignment straightforward like MATCH-MATCH-MISMATCH

ADD REPLYlink written 3 months ago by juan.crescente20

Parse MD tag. Matches will appear as numbers, mismatches as [ATCG]

ADD REPLYlink written 3 months ago by Santosh Anand4.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 802 users visited in the last hour