Question

Alternate Tag Types for Picard

4

Entering edit mode

6.0 years ago

Ben ▴ 30

After running picard ValidateSamFile I get errors for all reads like the one below - "NM" tags are missing.

WARNING: Read name SRR6251016.24364087_TGTTATGAGA, A record is missing a read group
WARNING: Record 1, Read name SRR6251016.24364087_TGTTATGAGA, NM tag (nucleotide differences) is missing

I am using bam files produced by a STAR mapping pipeline which have "nM" tags as shown below. These are identical in function to NM tags but are alternatively named.

SRR6251016.24364087_TGTTATGAGA  99  chr1    3043025 255 70M =   3043191 236 AGAAAATTGGACATAGTACTACCGGAGGATCCAGCAATACCTCTCCTGGGCATATATCCAGAAGATGCCC  EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEE<<EEEEEEEEEEEEEEAAEEEE  
NH:i:1  HI:i:1  AS:i:136    nM:i:1

Does anyone know how to set Picard to recognise these tags (which are of the same format) as I need to run Picard MarkDuplicates next in my analysis?

picard RNA-Seq alignment • 1.6k views

ADD COMMENT • link updated 6.0 years ago by John Marshall 3.1k • written 6.0 years ago by Ben ▴ 30

0

Entering edit mode

Why do you want Picard to recognize these tags? I think MarkDuplicates only compares 5' end of reads without considering NM tag

ADD REPLY • link 6.0 years ago by Jianyu ▴ 580

score 7 · Accepted Answer · 2019-10-30

7

Entering edit mode

6.0 years ago

John Marshall 3.1k

If STAR's nM was actually identical in function to the standard NM, one might ask why on earth they were making life hard for everyone by using a different tag name. But in fact it is not:

nM : is the number of mismatches per (paired) alignment, not to be confused with NM, which is the number of mismatches in each mate.

Look at STAR's --outSAMattributes option, which can be used to also output NM. One might ask the STAR developers why NM and other tags desired by Picard's “typical usage” validation are not in STAR's standard set of attributes…