MarkDuplicates showing error, not recognising SM tag present in bam Header
1
0
Entering edit mode
5.0 years ago

Hi all!

I am following Latest GATK blogs to do WGS data analysis.However, on MarkDuplicates steps I am stuck because of the following error

Exception in thread "main" htsjdk.samtools.SAMFormatException: Error parsing SAM header. @RG line missing SM tag. Line:
@RG     ID:AD0772_S2_L004       LB:L004 PL:ILLUMINA     PU:HK522DSXX; File /scratch/parashar/align_tc/AD0772_S2_L004_sort.bam; Line number 95
        at htsjdk.samtools.SAMTextHeaderCodec.reportErrorParsingLine(SAMTextHeaderCodec.java:258)
        at htsjdk.samtools.SAMTextHeaderCodec.access$200(SAMTextHeaderCodec.java:46)
        at htsjdk.samtools.SAMTextHeaderCodec$ParsedHeaderLine.requireTag(SAMTextHeaderCodec.java:358)
        at htsjdk.samtools.SAMTextHeaderCodec.parseRGLine(SAMTextHeaderCodec.java:168)
        at htsjdk.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:110)
        at htsjdk.samtools.BAMFileReader.readHeader(BAMFileReader.java:704)
        at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:298)
        at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:176)
        at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:396)
        at picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram.openInputs(AbstractMarkDuplicatesCommandLineProgram.java:220)
        at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:533)
        at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:257)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

I checked my sam file to ensure if the header line was added during alignment and yes it was there:

$grep "@RG" AD0772_S2_L004_aln.sam 
output: @RG ID:AD0772_S2_L004   LB:L004 PL:ILLUMINA PU:HK522DSXX
@PG ID:bwa  PN:bwa  VN:0.7.17-r1188 CL:bwa mem -R @RG\tID:AD0772_S2_L004\tLB:L004\tPL:ILLUMINA\tPU:HK522DSXX /home/parashar/archive/Megha/bwa_0.7.17/hg19.fa /home/parashar/archive/wgs/raw_data/AD0772_S2_L004_R1_001.fastq.gz /home/parashar/archive/wgs/raw_data/AD0772_S2_L004_R2_001.fastq.gz

I also checked the sorted file that I was using as Input to mark duplicates as:

$samtools view AD0772_S2_L004_sort.bam | grep "@RG"

It furnishes the same result.

I am not able to figure out why it us happening!!

The command I used for MarkDuplicate is:

cat samlist.txt | parallel --max-procs=5 "picard MarkDuplicates I={}_sort.bam O={}_dedup.bam M=mark_dup_metrics.txt ASSUME_SORTED=true 2> {}.stderr"  

My inputs are working fine and command is running. Using Picard version V-2.21.1-0

Note:I produced my sorted file using Sambamba and bam files using samtools

next-gen markduplicates picard alignment • 1.8k views
ADD COMMENT
0
Entering edit mode
cat samlist.txt | parallel --max-procs=5

parallel is cool but you'd better use a workflow manager (snakemake, nextflow, etc...)

ADD REPLY
1
Entering edit mode
5.0 years ago

yes there is a '@RG' but there is not sample (SM) associated to that RG. https://software.broadinstitute.org/gatk/documentation/article.php?id=6472

SM = Sample The name of the sample sequenced in this read group.
ADD COMMENT
0
Entering edit mode

I beg my pardon for the trouble. But you saved the day. Thanks. Seems my test samples had SM tag. I somehow forgot to add SM tag to the control one AD0772_S2_L004_aln.sam.

ADD REPLY

Login before adding your answer.

Traffic: 1983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6