I'm trying to mark duplicates using picard tools but have come across this error:
Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 37, Read name FCC2CCMACXX:5:1101:6198:22031#, RG ID on SAMRecord not found in header: 2_DB31
The first line of my .bam file reads:
FCC2CCMACXX:4:1101:13561:50127# 99 chrM 1 15 49S51M = 340 439 "sequence here" "q score here" NM:i:1 AS:i:46 XS:i:59 RG:Z:1_DB31
And if I look as
samtools view -H, I see things such as:
@HD VN:1.3 SO:coordinate @SQ SN:chrM LN:16571 @SQ SN:chr1 LN:249250621
Do I need to modify my original .bam file somehow?
EDIT: The reason I ask, is I am trying to feed this data into GATK. If I use something like
samtools rmdup to remove duplicates, will this dataset still work with GATK?