In Cigar String, What Is The Difference Between 'N' And 'D' ?
2
10
Entering edit mode
10.1 years ago
Chen Sun ★ 1.1k

In SAM file, the CIGAR string has the following options:

op    Description
M    Alignment match (can be a sequence match or mismatch
I    Insertion to the reference
D    Deletion from the reference
N    Skipped region from the reference
S    Soft clip on the read (clipped sequence present in <seq>)
H    Hard clip on the read (clipped sequence NOT present in <seq>)
P    Padding (silent deletion from the padded reference sequence)
`

I can not tell the difference between 'D' and 'N' when I analyze split read mapping. could someone give me an example to illustrate the difference?

cigar bam sam • 16k views
ADD COMMENT
0
Entering edit mode

Can you show a CIGAR string of such type you encountered??

ADD REPLY
13
Entering edit mode
8.9 years ago
amblina ▴ 140
REF:  ATCGATCGATCGATCGATCGATCGATCGATCG
          ||||||||||||||||||||||||||
QUERY:    ATC-ATCG-------------ATCAT

The query aligned to the reference would have the cigar: 3M1D4M13N5M if the N operation was being used. This is to distinguish between deletions in exons and large skips due to introns. This only makes sense when you're aligning things like cDNA/expression data. Genomic reads would just have the alignment 3M1D4M13D3M. Does that make things clearer?

ADD COMMENT
10
Entering edit mode
10.1 years ago
lomereiter ▴ 500

Usage of 'N' is explained in SAM format documentation as follows:

For mRNA-to-genome alignment, an N operation represents an intron. For other types of alignments, the interpretation of N is not defined.

ADD COMMENT

Login before adding your answer.

Traffic: 3180 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6