Question: Picard-tools mark duplicates error, missing @RG
0
gravatar for kezcleal
3.7 years ago by
kezcleal130
United Kingdom
kezcleal130 wrote:

Hi, Im trying to mark duplicates using picard tools but have come accross this error:

 

Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 37, Read name FCC2CCMACXX:5:1101:6198:22031#, RG ID on SAMRecord not found in header: 2_DB31

 

The first line of my .bam file reads:

 

FCC2CCMACXX:4:1101:13561:50127#    99    chrM    1    15    49S51M    =    340    439    "sequence here"   "q score here"    NM:i:1    AS:i:46    XS:i:59    RG:Z:1_DB31

 

And if I look as samtools view -H, I see things such as: 

 

@HD    VN:1.3    SO:coordinate

@SQ    SN:chrM    LN:16571

@SQ    SN:chr1    LN:249250621 

 

Do I need to modify my original .bam file somehow?

EDIT: The reason I ask, is I am trying to feed this data into GATK. If I use something like samtools rmdup to remove duplicates, will this dataset still work with GATK?

 

 

 

 

 

sequence next-gen • 1.6k views
ADD COMMENTlink modified 3.7 years ago by bruce.moran500 • written 3.7 years ago by kezcleal130
2
gravatar for bruce.moran
3.7 years ago by
bruce.moran500
Ireland
bruce.moran500 wrote:

You could add readgroups using Picard, or try adding 'VALIDATION_STRINGENCY=LENIENT' to your command which will pass over these kinds of errors but you should be aware they still exist, so for example in GATK you will require readgroups for calling variants and downstream you will wish you had added readgroups. For RNAseq where you just want to call counts you should be OK without readgroups.

ADD COMMENTlink written 3.7 years ago by bruce.moran500
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 693 users visited in the last hour