Question

In Cigar String, What Is The Difference Between 'N' And 'D' ?

10

Entering edit mode

10.1 years ago

Chen Sun ★ 1.1k

In SAM file, the CIGAR string has the following options:

op    Description
M    Alignment match (can be a sequence match or mismatch
I    Insertion to the reference
D    Deletion from the reference
N    Skipped region from the reference
S    Soft clip on the read (clipped sequence present in <seq>)
H    Hard clip on the read (clipped sequence NOT present in <seq>)
P    Padding (silent deletion from the padded reference sequence)
`

I can not tell the difference between 'D' and 'N' when I analyze split read mapping. could someone give me an example to illustrate the difference?

cigar bam sam • 16k views

ADD COMMENT • link updated 15 months ago by Ram 43k • written 10.1 years ago by Chen Sun ★ 1.1k

0

Entering edit mode

Can you show a CIGAR string of such type you encountered??

ADD REPLY • link 10.1 years ago by Varun Gupta ★ 1.3k

Ram · Answer 1 · 2015-05-19

REF:  ATCGATCGATCGATCGATCGATCGATCGATCG
          ||||||||||||||||||||||||||
QUERY:    ATC-ATCG-------------ATCAT

The query aligned to the reference would have the cigar: 3M1D4M13N5M if the N operation was being used. This is to distinguish between deletions in exons and large skips due to introns. This only makes sense when you're aligning things like cDNA/expression data. Genomic reads would just have the alignment 3M1D4M13D3M. Does that make things clearer?

Ram · Answer 2 · 2014-03-27

10

Entering edit mode

10.1 years ago

lomereiter ▴ 500

Usage of 'N' is explained in SAM format documentation as follows:

For mRNA-to-genome alignment, an N operation represents an intron. For other types of alignments, the interpretation of N is not defined.

ADD COMMENT • link updated 4.5 years ago by Ram 43k • written 10.1 years ago by lomereiter ▴ 500