In Cigar String, What Is The Difference Between 'N' And 'D' ?
2
9
Entering edit mode
7.8 years ago
Chen ★ 1.1k

In SAM file, the CIGAR string has the following options:

op    Description
M    Alignment match (can be a sequence match or mismatch
I    Insertion to the reference
D    Deletion from the reference
N    Skipped region from the reference
S    Soft clip on the read (clipped sequence present in <seq>)
H    Hard clip on the read (clipped sequence NOT present in <seq>)
P    Padding (silent deletion from the padded reference sequence)

I can not tell the difference between 'D' and 'N' when I analyze split read mapping. could someone give me an example to illustrate the difference?

sam cigar bam • 12k views
ADD COMMENT
0
Entering edit mode

Can you show a CIGAR string of such type you encountered??

ADD REPLY
10
Entering edit mode
6.7 years ago
amblina ▴ 110
REF:  ATCGATCGATCGATCGATCGATCGATCGATCG
          ||||||||||||||||||||||||||
QUERY:    ATC-ATCG-------------ATCAT

The query aligned to the reference would have the cigar: 3M1D4M13N5M if the N operation was being used.  This is to distinguish between deletions in exons and large skips due to introns.  This only makes sense when you're aligning things like cDNA/expression data.  Genomic reads would just have the alignment 3M1D4M13D3M.  Does that make things clearer?

ADD COMMENT
9
Entering edit mode
7.8 years ago
lomereiter ▴ 470

Usage of 'N' is explained in SAM format documentation as follows:

For mRNA-to-genome alignment, an N operation represents an intron. For other types of alignments, the interpretation of N is not defined.

ADD COMMENT

Login before adding your answer.

Traffic: 2432 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6