sam file header, how is @SQ LN calculated?
1
0
Entering edit mode
7.6 years ago
n.roggli • 0

I would like to understand how the LN values found in the header are calculated. The sam specification says that it is the reference sequence length, but it doesn't seem to match. I have the following in a sam file:

@SQ     SN:I    LN:15072434

but that doesn't match the nucleotide count of the reference sequence:

gunzip -c Caenorhabditis_elegans.WBcel235.dna_rm.chromosome.I.fa.gz | grep -e ^[^\>] | wc -c

returns 15323642

sam • 2.8k views
ADD COMMENT
0
Entering edit mode
7.6 years ago
n.roggli • 0

found why, I was counting newline characters too

gunzip -c Caenorhabditis_elegans.WBcel235.dna_rm.chromosome.I.fa.gz | grep -e ^[^\>] | tr -d '\n' | wc -c

returns the expected value 15072434

ADD COMMENT

Login before adding your answer.

Traffic: 2744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6