Question: Gatk Realignertargetcreator Sam/Bam Header Error
0
gravatar for Houkto
6.2 years ago by
Houkto210
Houkto210 wrote:

Hi all, I get an GATK error

SAM/BAM file SAMFileReader{/merged_bam_markdup.bam} is malformed: Read HWI-ST303_0093:5:5:13416:34802#0
is either missing the read group or its read group is not defined in the BAM header, both of which are 
required by the GATK.

when I ran

java -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T RealignerTargetCreator -R ref.fa -I 
merged_bam_markdup.bam -o read.intervals

My Bam file has a header

samtools view -h merged_bam_files_indexed_markduplicate.bam | grep ^@RG
@RG     ID:test1      PL:Illumina     PU:HWI-ST303    LB:test     PI:75   SM:test   CN:japan
@RG     ID:test2      PL:Illumina     PU:HWI-ST303    LB:test     PI:75   SM:test    CN:japan

A grep of the read within the error

HWI-ST303_0093:5:5:13416:34802#0        99      1       1090    29      23S60M17S       =       
1150    160 
TGTTTGGGTTGAAGATTGATACTGGAAGAAGATTAGAATTGTAGAAAGGGGAAAACGATGTTAGAAAGTTAATACGGCTTACTCCAGATCCTTGGATCTC       
GGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGFGGGGGGGGGDGFGFGGGGGFEDFGEGGGDGEG?FGGDDGFFDGGEDDFFFFEDG?E
MD:Z:60 PG:Z:MarkDuplicates     RG:Z:test1      XG:i:0  AM:i:29 NM:i:0  SM:i:29 XM:i:0  XO:i:0  XT:A:M

Following GATK recommended solution using Picard

java -XX:MaxDirectMemorySize=4G -jar picard-tools-1.85/AddOrReplaceReadGroups.jar I= test.bam O=
test.header.bam SORT_ORDER=coordinate RGID=test RGLB=test  RGPL=Illumina RGSM=test/ RGPU=HWI-ST303
RGCN=japan CREATE_INDEX=True

Got this error

Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record
 12247781, Read name HWI-ST303_0093:5:26:10129:50409#0, MAPQ should be 0 for unmapped read.

I also tried

java -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fa
 -I merged_bam_files_indexed_markduplicate.bam -o reads.intervals --validation_strictness LENIENT

but I got the first error message

Any suggestions would be great ?

N.B: I used to ran the same command (first one) with GATK version 1.2 and did not result in error

I also tried

picard/MarkDuplicates.jar I=test.bam O=test_markduplicate.bam M=test.matrix AS=true
VALIDATION_STRINGENCY=LENIANT

then indexed with samtools,then I got the following error

Ignoring SAM validation error: ERROR: Record (number), Read name HWI-ST303_0093:5:5:13416:34802#0,
RG ID on SAMRecord not found in header: test1

My pipeline

bwa aln -q 20 ref.fa read > files.sai
bwa sampe ref.fa file1.sai file2.sai read1 read2 > test1.sam
samtools view -bS test1.sam | samtools sort - test
samtools  index test1.bam
samtools merge -rh RG.txt test test1.bam test2.bam
gatk bam sam • 4.7k views
ADD COMMENTlink modified 6.2 years ago by samsara580 • written 6.2 years ago by Houkto210
1
gravatar for Ashutosh Pandey
6.2 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

I dont use samtools for merging. I use Picard for that. In your pipeline can you try using Picard for the last step (merging).

ADD COMMENTlink written 6.2 years ago by Ashutosh Pandey11k

thanks ashutoshmits, will that fix the problem ?

ADD REPLYlink written 6.2 years ago by Houkto210

Use Picard for merging. Try to use Picard for all BAM operations, it does a better job than samtools with the headers / metadata and produces BAM files that are more compliant with GATK. Picard and GATK are both developed maintained at the Broad Institute.

ADD REPLYlink written 6.2 years ago by William4.4k

I am not sure about it but you can give it a try. As William mentioned that Picard does a better job then samtools for merging the bam files.

ADD REPLYlink written 6.2 years ago by Ashutosh Pandey11k
0
gravatar for samsara
6.2 years ago by
samsara580
The Earth
samsara580 wrote:

When you merge BAM files you should group the reads (for GATK pipeline). The error is saying RG ID is not found. You can group reads using picard's AddReplaceReadGroups. First add read groups (RG) for both BAM/SAM files and then merge them.

ADD COMMENTlink modified 6.2 years ago • written 6.2 years ago by samsara580
1

I did that already, and I got an error about MAPQ should be 0. I think the problem is not with the header but with bwa. I am still waiting for a solution.

ADD REPLYlink written 6.2 years ago by Houkto210

It seems that the problem is with the aligner. Have you tried Bowtie instead ? Following links could be helpful http://seqanswers.com/forums/showthread.php?t=4246 Final Solution For "Mapq Should Be 0 For Unmapped Read."

ADD REPLYlink written 6.2 years ago by samsara580

World, I am thinking about it, however, it is kinda hard now. Since I worked on the same data using earlier version of GATK 1.2 and no error was reported. But these errors with the new version of GATK are known because its something that BWA does but they are not suppressed with VALIDATION_STRINGENCY=LENIANT.

ADD REPLYlink modified 6.2 years ago • written 6.2 years ago by Houkto210

You can use the --rg-id <text> --rg <text> in Bowtie2 if you are willing to realign the data. </text></text>

ADD REPLYlink written 6.2 years ago by Zev.Kronenberg11k

Thanks Zev, I looked at bowtie2 manual and I think I will go with the default settings (--sensitive) but if you can direct me to a pipeline of bowtie2 it would be great.

ADD REPLYlink written 6.2 years ago by Houkto210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1260 users visited in the last hour