Question: Picard-tools mark duplicates error, missing @RG
gravatar for kezcleal
4.0 years ago by
United Kingdom
kezcleal130 wrote:

Hi, Im trying to mark duplicates using picard tools but have come accross this error:


Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 37, Read name FCC2CCMACXX:5:1101:6198:22031#, RG ID on SAMRecord not found in header: 2_DB31


The first line of my .bam file reads:


FCC2CCMACXX:4:1101:13561:50127#    99    chrM    1    15    49S51M    =    340    439    "sequence here"   "q score here"    NM:i:1    AS:i:46    XS:i:59    RG:Z:1_DB31


And if I look as samtools view -H, I see things such as: 


@HD    VN:1.3    SO:coordinate

@SQ    SN:chrM    LN:16571

@SQ    SN:chr1    LN:249250621 


Do I need to modify my original .bam file somehow?

EDIT: The reason I ask, is I am trying to feed this data into GATK. If I use something like samtools rmdup to remove duplicates, will this dataset still work with GATK?






sequence next-gen • 1.7k views
ADD COMMENTlink modified 4.0 years ago by bruce.moran620 • written 4.0 years ago by kezcleal130
gravatar for bruce.moran
4.0 years ago by
bruce.moran620 wrote:

You could add readgroups using Picard, or try adding 'VALIDATION_STRINGENCY=LENIENT' to your command which will pass over these kinds of errors but you should be aware they still exist, so for example in GATK you will require readgroups for calling variants and downstream you will wish you had added readgroups. For RNAseq where you just want to call counts you should be OK without readgroups.

ADD COMMENTlink written 4.0 years ago by bruce.moran620
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2131 users visited in the last hour