Question

ERROR::INVALID_TAG_NM:Record A, Read name B, NM tag (nucleotide differences) in file [0] does not match reality [1]

0

Entering edit mode

5 months ago

curious ▴ 900

I seem to be getting a malformed bam from bwa mem

After bwa mem & using GRCh38DH, 1000 Genomes Project version as ref I run ValidateSamFile and get many errors as follows:

ERROR::INVALID_TAG_NM:Record 349909, Read name LH00400:21:22N52NLT4:7:1101:27171:1154, NM tag (nucleotide differences) in file [0] does not match reality [1]

I inspect one of the offending reads and see:

CIGAR: 101M48S

RNAME: chr2

POS: 16145020

NM tag: NM:i:0

I pulled the corresponding ref sequence to compare to the read sequence:

samtools faidx my_ref.fa chr2:16145020-16145168

The read and ref are identical for the first 100 bases, eg:

ref[:100] == read[:100]

True

but not after that:

ref[:101] == read[:101]

False

starting with base 101 ref is all 'N''s

ref[100:]

'NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN'

what is going on here?

My main two questions are:

Why does the CIGAR indicate 101M when it seems only the first 100 bases match between ref and read?

where exactly is this INVALID_TAG_NM error coming from and how to fix (or can this be ignored)? I tried running samtools calmd -br {input.bam} {input.ref} and I can get ValidateSamFile to pass, but am concerned about why I would have to do this at all.

bwa • 1.3k views

ADD COMMENT • link 5 months ago by curious ▴ 900

score 0 · Answer 1 · 2025-06-02

0

Entering edit mode

5 months ago

Pierre Lindenbaum 166k

or can this be ignored

https://gatk.broadinstitute.org/hc/en-us/articles/360036854731-ValidateSamFile-Picard#--IGNORE

List of validation error types to ignore.

--IGNORE  INVALID_TAG_NM

where exactly is this INVALID_TAG_NM error coming from

https://github.com/utgenome/picard/blob/8ef0bb735d06ffd1d4a399e15632713adbad7bb6/src/java/net/sf/picard/sam/SamFileValidator.java#L418C34-L418C48

ADD COMMENT • link 5 months ago by Pierre Lindenbaum 166k

0

Entering edit mode

Thanks for the response

I think I am more wondering if this is a typical thing that people see and am wondering what is wrong with my data that might be causing it (which I am struggling a bit to grasp)

A google search of "ERROR::INVALID_TAG_NM" produces hardly any hits. I tried BWA MEM v0.7.18, 0.7.19 & BWA MEM and getting the same error with all of them.

The error is happening for ~0.5% of my reads, so it is kind of a lot.

Do you think running samtools calmd -br {input.bam} {input.ref} post-alignment is a reasonable solution to fixing the error?

ADD REPLY • link 5 months ago by curious ▴ 900