Understanding bowtie2 output
1
1
Entering edit mode
9.4 years ago
jyu429 ▴ 120

Hi I did local alignment and the output is as below. I'm confused about how to recreate the alignment from this output. Do I look at the CIGAR 1S115M134S and how do I interpret such a thing? Thanks!

H06JUADXX130110:1:1213:20611:8765       16      1-8     174     22      1S115M134S      *        0       0       GAGGCTATGCGCAGAAGTTCCGGGGCAGAGTAACCATGACCAGGGACACCTCCATAAGCACAGCCTACATGGAGTTGAGGAGACTGACATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGGCTCGAGTACAGCAGCAGCCGATAACTACTACTACTACTACTACATGGACGTCTGGGGCAAAGGGACCACGGTCACCGTCTCCTCAGGTAAGAATGGCCACTCTAGGGCCTTTCATTTCCCCTACTG      ##########################################################################BAAAAA@@@;=@B@?BABAB=>?;85B?94?B@AABA@@3;>B:@CABBB@A8@@??@<A@@BBACBA8@@ABBAABA@=>A>AAAAA@A@?AAAA??;@BACB??AAABB>@?>@@A8>>@@?@8?AAA@?@@@>>??A>??@>@?@@@@??@>?@???@??????@?@??><@?      AS:i:186        XN:i:0  XM:i:8  XO:i:0  XG:i:0  NM:i:8  MD:Z:9A10A9C12A29C4C2C4G28      YT:Z:UU
bowtie2 • 9.7k views
ADD COMMENT
3
Entering edit mode
9.4 years ago
EagleEye 7.5k

This explanation will help you understanding it properly: http://genome.sph.umich.edu/wiki/SAM#section_2

You may have heard the term CIGAR, but wondered what it means. Hopefully this section will help clarify it.

The sequence being aligned to a reference may have additional bases that are not in the reference or may be missing bases that are in the reference. The CIGAR string is a sequence of of base lengths and the associated operation. They are used to indicate things like which bases align (either a match/mismatch) with the reference, are deleted from the reference, and are insertions that are not in the reference.

For example:

RefPos:     1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19
Reference:  C  C  A  T  A  C  T  G  A  A  C  T  G  A  C  T  A  A  C
Read: ACTAGAATGGCT

Aligning these two:

RefPos:     1  2  3  4  5  6  7     8  9 10 11 12 13 14 15 16 17 18 19
Reference:  C  C  A  T  A  C  T     G  A  A  C  T  G  A  C  T  A  A  C
Read:                   A  C  T  A  G  A  A     T  G  G  C  T

With the alignment above, you get:

POS: 5
CIGAR: 3M1I3M1D5M

The POS indicates that the read aligns starting at position 5 on the reference. The CIGAR says that the first 3 bases in the read sequence align with the reference. The next base in the read does not exist in the reference. Then 3 bases align with the reference. The next reference base does not exist in the read sequence, then 5 more bases align with the reference. Note that at position 14, the base in the read is different than the reference, but it still counts as an M since it aligns to that position.

ADD COMMENT
1
Entering edit mode

Look into page number 5 from this PDF: http://samtools.github.io/hts-specs/SAMv1.pdf

M  alignment match (can be a sequence match or mismatch)
I   insertion to the reference
D   deletion from the reference
N   skipped region from the reference
S   soft clipping (clipped sequences present in SEQ)
H   hard clipping (clipped sequences NOT present in SEQ)
P   padding (silent deletion from padded reference)
=  sequence match
X  sequence mismatch
ADD REPLY
0
Entering edit mode

Thanks so much!

ADD REPLY
0
Entering edit mode

Do the numbers in front of the letters mean how many times the operation occurred? How do I know what positions they occurred at though?

ADD REPLY

Login before adding your answer.

Traffic: 1511 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6