I want to know how a skipped region in the reference, or N in the CIGAR string, looks in the alignment. To try and explain what I mean I use the example provided from the SAM format specification (http://genome.sph.umich.edu/wiki/SAM), which does not include an N example:
Ref + read RefPos: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Reference: C C A T A C T G A A C T G A C T A A C Read: ACTAGAATGGCT Alignment RefPos: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Reference: C C A T A C T G A A C T G A C T A A C Read: A C T A G A A T G G C T Cigar: POS: 5 CIGAR: 3M1I3M1D5M
Now, in position 11 there is an insertion in the reference sequence. However, I would think that you can't distinguish between a skipped region or an insertion in the reference. Therefore the CIGAR string could also have been 3M1I3M1N5M
So how is it the alignment of a skipped region or an insertion in the reference sequence different? Is it only a skipped region if the C in position 11 is an N?