I am learning bout the format of the different NGS formats. Most files are quite easy to understand at least the general aspects. However, when I try to understand the Sam files generated in my lab, I can't easily understand the different fields.
The first line of the body of one of my Sam file looks like this
M00321:561:000000000-JM5F9:1:2107:12468:12982 65 1 14588 9 117M = 14588 0 CCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTGGGTCTGGGGGGGAAGGTGTCATGGAGCCCCCTACGATTC CCCCCGCFFEEG7@@FFFGGFFCF<<FFGGFGFEEFD<FEFGGF@FGGFGEEFFGGFFF<EAFGGGGGG7@@@EF,C<CECEGCFGCCFE:<C>FFFCFF99:<,8<:*C@C:7*CF NM:i:0 MD:Z:117 MC:Z:117M AS:i:117 XS:i:112 RG:Z:1_2 XA:Z:15,-102516461 ,117M,1;9,+14699,117M,2;2,-114356309,117M,3;12,-90921,117M,3;
And this is a table explaining a sam file
Things I don't understand are
- Second column (FLAG) makes no sense to me according to the next table
- What exactly means the third column (in my example the number 1) should be a string shouldn't?
- finally, why my sequence has one "white space", more sequence and then a couple of @, more sequence and again a couple of << plus other characters are not part of the sequence?