Error with MuTect: SAM/BAM file SAMFileReader is malformed
1
0
Entering edit mode
9.2 years ago
eem0306 • 0

I tried to run MuTect with following command line

/usr/lib/jvm/java-1.6.0/bin/java -Xmx2g -jar /home/exman/muTect-1.1.4.jar \
--analysis_type MuTect \
--reference_sequence /data/eem0306/ref/1.fa \
--cosmic /data/eem0306/ref/b37_cosmic_v54_120711.vcf \
--dbsnp /data/eem0306/ref/dbsnp_138.hg19.nochr.vcf \
--input_file:normal /data/eem0306/somatic.caller/sample/N.rmdup.realigned.BQSR.h.bam \
--input_file:tumor /data/eem0306/somatic.caller/using.mutect/myAnalysis20/N20.result.sorted.h.qsort.bwasw.h.filter.sorted.bam \
--out /data/eem0306/somatic.caller/using.mutect/2nd.myAnalysis20/n20.results \
--coverage_file n20.coverage.wig.txt

but I got error messages several times, like this

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.2-25-g2a68eab):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
**##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/data/eem0306/somatic.caller/using.mutect/myAnalysis20/N20.result.sorted.h.qsort.bwasw.h.filter.sorted.bam} is malformed: Read D0ENMACXX111207:7:1202:2132:140703 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK.  Please use http://www.broadinstitute.org/gsa/wiki/index.php/ReplaceReadGroups to fix this problem**
##### ERROR ------------------------------------------------------------------------------------------

The headers in my file is

@HD     VN:1.4  GO:none SO:coordinate
@SQ     SN:1    LN:249250621    UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   AS:GRCh37       M5:1b22b98cdeb4a9304cb5d48026a85128     SP:Homo Sapiens
@RG     ID:C09DF.1      PL:illumina     PU:C09DFACXX111207.1.TTGAGCCT   LB:Solexa-76163 DT:2011-12-07T14:00:00+0900     SM:HCC1143 BL   CN:BI
@RG     ID:C09DF.2      PL:illumina     PU:C09DFACXX111207.2.TTGAGCCT   LB:Solexa-76163 DT:2011-12-07T14:00:00+0900     SM:HCC1143 BL   CN:BI
@RG     ID:D0EN0.4      PL:illumina     PU:D0EN0ACXX111207.4.TTGAGCCT   LB:Solexa-76163 DT:2011-12-07T14:00:00+0900     SM:HCC1143 BL   CN:BI
@RG     ID:D0EN0.7      PL:illumina     PU:D0EN0ACXX111207.7.TTGAGCCT   LB:Solexa-76163 DT:2011-12-07T14:00:00+0900     SM:HCC1143 BL   CN:BI
@RG     ID:D0EN0.8      PL:illumina     PU:D0EN0ACXX111207.8.TTGAGCCT   LB:Solexa-76163 DT:2011-12-07T14:00:00+0900     SM:HCC1143 BL   CN:BI
@RG     ID:D0ENM.1      PL:illumina     PU:D0ENMACXX111207.1.CCAGTTAG   LB:Sage-75643   DT:2011-12-07T14:00:00+0900     SM:HCC1143      CN:BI
@RG     ID:D0ENM.2      PL:illumina     PU:D0ENMACXX111207.2.CCAGTTAG   LB:Sage-75643   DT:2011-12-07T14:00:00+0900     SM:HCC1143      CN:BI
@RG     ID:D0ENM.3      PL:illumina     PU:D0ENMACXX111207.3.CCAGTTAG   LB:Sage-75643   DT:2011-12-07T14:00:00+0900     SM:HCC1143      CN:BI
@RG     ID:D0ENM.5      PL:illumina     PU:D0ENMACXX111207.5.CCAGTTAG   LB:Sage-75643   DT:2011-12-07T14:00:00+0900     SM:HCC1143      CN:BI
@RG     ID:D0ENM.6      PL:illumina     PU:D0ENMACXX111207.6.CCAGTTAG   LB:Sage-75643   DT:2011-12-07T14:00:00+0900     SM:HCC1143      CN:BI
@RG     ID:D0ENM.7      PL:illumina     PU:D0ENMACXX111207.7.CCAGTTAG   LB:Sage-75643   DT:2011-12-07T14:00:00+0900     SM:HCC1143      CN:BI
@PG     ID:GATK IndelRealigner  CL:knownAlleles=[(RodBinding name=knownAlleles source=/bio/lib/ref/1000G_phase1.indels.b37.vcf), (RodBinding name=knownAlleles2 source=/bio/lib/ref/Mills_and_1000G_gold_standard.indels.b37.vcf)] targetIntervals=n20t80.rmdup_intervals.list LODThresholdForCleaning=5.0 consensusDeterminationModel=USE_READS entropyThreshold=0.15 maxReadsInMemory=150000 maxIsizeForMovement=3000 maxPositionalMoveAllowed=200 maxConsensuses=30 maxReadsForConsensuses=120 maxReadsForRealignment=20000 noOriginalAlignmentTags=false nWayOut=null generate_nWayOut_md5s=false check_early=false noPGTag=false keepPGTags=false indelsFileForDebugging=null statisticsFileForDebugging=null SNPsFileForDebugging=null
@PG     ID:GATK TableRecalibration      VN:1.3-14-g59da26a      CL:default_read_group=null default_platform=null force_read_group=null force_platform=null window_size_nqs=5 homopolymer_nback=7 exception_if_no_tile=false solid_recal_mode=SET_Q_ZERO solid_nocall_strategy=THROW_EXCEPTION recal_file=/seq/picard/D0ENMACXX/C1-210_2011-12-07_2011-12-18/1/Sage-75643/D0ENMACXX.1.recal_data.csv preserve_qscores_less_than=5 smoothing=1 max_quality_score=50 doNotWriteOriginalQuals=false no_pg_tag=false fail_with_no_eof_marker=false skipUQUpdate=false Covariates=[ReadGroupCovariate, QualityScoreCovariate, CycleCovariate, DinucCovariate]
@PG     ID:GATK TableRecalibration.1    VN:1.3-14-g59da26a      CL:default_read_group=null default_platform=null force_read_group=null force_platform=null window_size_nqs=5 homopolymer_nback=7 exception_if_no_tile=false solid_recal_mode=SET_Q_ZERO solid_nocall_strategy=THROW_EXCEPTION recal_file=/seq/picard/D0ENMACXX/C1-210_2011-12-07_2011-12-18/3/Sage-75643/D0ENMACXX.3.recal_data.csv preserve_qscores_less_than=5 smoothing=1 max_quality_score=50 doNotWriteOriginalQuals=false no_pg_tag=false fail_with_no_eof_marker=false skipUQUpdate=false Covariates=[ReadGroupCovariate, QualityScoreCovariate, CycleCovariate, DinucCovariate]
@PG     ID:GATK TableRecalibration.2    VN:1.3-14-g59da26a      CL:default_read_group=null default_platform=null force_read_group=null force_platform=null window_size_nqs=5 homopolymer_nback=7 exception_if_no_tile=false solid_recal_mode=SET_Q_ZERO solid_nocall_strategy=THROW_EXCEPTION recal_file=/seq/picard/D0ENMACXX/C1-210_2011-12-07_2011-12-18/7/Sage-75643/D0ENMACXX.7.recal_data.csv preserve_qscores_less_than=5 smoothing=1 max_quality_score=50 doNotWriteOriginalQuals=false no_pg_tag=false fail_with_no_eof_marker=false skipUQUpdate=false Covariates=[ReadGroupCovariate, QualityScoreCovariate, CycleCovariate, DinucCovariate]
@PG     ID:GATK TableRecalibration.3    VN:1.3-14-g59da26a      CL:default_read_group=null default_platform=null force_read_group=null force_platform=null window_size_nqs=5 homopolymer_nback=7 exception_if_no_tile=false solid_recal_mode=SET_Q_ZERO solid_nocall_strategy=THROW_EXCEPTION recal_file=/seq/picard/D0ENMACXX/C1-210_2011-12-07_2011-12-18/5/Sage-75643/D0ENMACXX.5.recal_data.csv preserve_qscores_less_than=5 smoothing=1 max_quality_score=50 doNotWriteOriginalQuals=false no_pg_tag=false fail_with_no_eof_marker=false skipUQUpdate=false Covariates=[ReadGroupCovariate, QualityScoreCovariate, CycleCovariate, DinucCovariate]
@PG     ID:GATK TableRecalibration.4    VN:1.3-14-g59da26a      CL:default_read_group=null default_platform=null force_read_group=null force_platform=null window_size_nqs=5 homopolymer_nback=7 exception_if_no_tile=false solid_recal_mode=SET_Q_ZERO solid_nocall_strategy=THROW_EXCEPTION recal_file=/seq/picard/D0ENMACXX/C1-210_2011-12-07_2011-12-18/6/Sage-75643/D0ENMACXX.6.recal_data.csv preserve_qscores_less_than=5 smoothing=1 max_quality_score=50 doNotWriteOriginalQuals=false no_pg_tag=false fail_with_no_eof_marker=false skipUQUpdate=false Covariates=[ReadGroupCovariate, QualityScoreCovariate, CycleCovariate, DinucCovariate]
@PG     ID:GATK TableRecalibration.5    VN:1.3-14-g59da26a      CL:default_read_group=null default_platform=null force_read_group=null force_platform=null window_size_nqs=5 homopolymer_nback=7 exception_if_no_tile=false solid_recal_mode=SET_Q_ZERO solid_nocall_strategy=THROW_EXCEPTION recal_file=/seq/picard/D0ENMACXX/C1-210_2011-12-07_2011-12-18/2/Sage-75643/D0ENMACXX.2.recal_data.csv preserve_qscores_less_than=5 smoothing=1 max_quality_score=50 doNotWriteOriginalQuals=false no_pg_tag=false fail_with_no_eof_marker=false skipUQUpdate=false Covariates=[ReadGroupCovariate, QualityScoreCovariate, CycleCovariate, DinucCovariate]
@PG     ID:bwa  PN:bwa  VN:0.5.9-r16    CL:bwa aln Homo_sapiens_assembly19.fasta -q 5 -l 32 -k 2 -t 4 -o 1 -f D0ENMACXX.1.Sage-75643.Homo_sapiens_assembly19.2.sai D0ENMACXX.1.Sage-75643.2.fastq.gz; bwa aln Homo_sapiens_assembly19.fasta -q 5 -l 32 -k 2 -t 4 -o 1 -f D0ENMACXX.1.Sage-75643.Homo_sapiens_assembly19.1.sai D0ENMACXX.1.Sage-75643.1.fastq.gz; bwa sampe -P -f D0ENMACXX.1.Sage-75643.Homo_sapiens_assembly19.aligned_bwa.sam Homo_sapiens_assembly19.fasta D0ENMACXX.1.Sage-75643.Homo_sapiens_assembly19.1.sai D0ENMACXX.1.Sage-75643.Homo_sapiens_assembly19.2.sai D0ENMACXX.1.Sage-75643.1.fastq.gz D0ENMACXX.1.Sage-75643.2.fastq.gz
@PG     ID:bwa.1        PN:bwa  VN:0.5.9-r16    CL:bwa aln Homo_sapiens_assembly19.fasta -q 5 -l 32 -k 2 -t 4 -o 1 -f D0ENMACXX.3.Sage-75643.Homo_sapiens_assembly19.2.sai D0ENMACXX.3.Sage-75643.2.fastq.gz; bwa aln Homo_sapiens_assembly19.fasta -q 5 -l 32 -k 2 -t 4 -o 1 -f D0ENMACXX.3.Sage-75643.Homo_sapiens_assembly19.1.sai D0ENMACXX.3.Sage-75643.1.fastq.gz; bwa sampe -P -f D0ENMACXX.3.Sage-75643.Homo_sapiens_assembly19.aligned_bwa.sam Homo_sapiens_assembly19.fasta D0ENMACXX.3.Sage-75643.Homo_sapiens_assembly19.1.sai D0ENMACXX.3.Sage-75643.Homo_sapiens_assembly19.2.sai D0ENMACXX.3.Sage-75643.1.fastq.gz D0ENMACXX.3.Sage-75643.2.fastq.gz
@PG     ID:bwa.2        PN:bwa  VN:0.5.9-r16    CL:bwa aln Homo_sapiens_assembly19.fasta -q 5 -l 32 -k 2 -t 4 -o 1 -f D0ENMACXX.7.Sage-75643.Homo_sapiens_assembly19.2.sai D0ENMACXX.7.Sage-75643.2.fastq.gz; bwa aln Homo_sapiens_assembly19.fasta -q 5 -l 32 -k 2 -t 4 -o 1 -f D0ENMACXX.7.Sage-75643.Homo_sapiens_assembly19.1.sai D0ENMACXX.7.Sage-75643.1.fastq.gz; bwa sampe -P -f D0ENMACXX.7.Sage-75643.Homo_sapiens_assembly19.aligned_bwa.sam Homo_sapiens_assembly19.fasta D0ENMACXX.7.Sage-75643.Homo_sapiens_assembly19.1.sai D0ENMACXX.7.Sage-75643.Homo_sapiens_assembly19.2.sai D0ENMACXX.7.Sage-75643.1.fastq.gz D0ENMACXX.7.Sage-75643.2.fastq.gz
.....

and some of the reads info like this

D0ENMACXX111207:7:1202:2132:140703      163     1       10002   20      90M11S  =       10181   274     AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCAACCCTAACCCTAACCCATACTCTACCCAGTACCCTAACCCTAACCCTTACCCTAACCC   =;?ACC?AEE.DDA7=7?CDDE
ECFEDCFC<AFE?H=+EBA,1@F7>@@70>F################################################   AS:i:58 XS:i:54 XF:i:0  XE:i:1  NM:i:8  XT:i:1
D0ENMACXX111207:6:2202:6438:59394       163     1       10003   20      11M1I84M5S      =       10183   280     ACCCTAACCCTAAACCCTNACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCGAACCCTAACCCTTACCC   =>=?BA?FCCDCB@
GBCD#AFEECEDDFFFEDGFFFDDFFF@BEFBF@@EFEEFAE@D=AAABEFCAABB*C=8=;F########################   AS:i:83 XS:i:79 XF:i:0  XE:i:1  NM:i:2  XT:i:1
D0ENMACXX111207:1:2304:12628:161494     99      1       10009   28      59M1D42M        =       10195   287     ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCTCACCCTAACCCTAACCCTAACCCTAACCCAA   @@@AFDCFEEGFDF
FFGECFEFGFCFFFGCDFFFGBAEFFGFCFFCF=DC<FE?DCFFBDEFF2?:F##################################   AS:i:90 XS:i:80 XF:i:3  XE:i:2  NM:i:2
D0ENMACXX111207:3:2302:13424:148033     163     1       10011   22      101M    =       10353   427     CCTAGCCCTAGCCCTAGCCCTAGCCCTAGCCCTAGCCCTAGCCCTAGCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAC   <@D=BBADEB@DEDGE?GFFFE
FDEEGE9EGGEE@@CEGE<GGCGED;@ADGBGCFE@EEFFGGEFEED0CD?E;AC2;D2;BFDD###############   AS:i:69 XS:i:63 XF:i:3  XE:i:2  NM:i:8
D0ENMACXX111207:5:2201:12464:124446     99      1       10027   6       29M2I70M        =       10298   366     ACCCTAACCCTAACCCTAACCCTAACCCTAAAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCCAACCC   @@=BCEEFCECCF;
F>@CEBGGCFB9FFFFB@-AF>E@6FFGCF?FFFFB@>DBF;7B9FF=FEEFGE@FF?/@8GCE<<+:?EE@@?>A?B?E#######   AS:i:86 XS:i:76 XF:i:3  XE:i:1  NM:i:3

Actually I replaced the header of my bam file with the header of the original bam file. The error was caused because the RG info does not accord with the RG info of the reads?

How can I solve this problems?

SAMFileReader software-error MuTect • 3.3k views
ADD COMMENT
0
Entering edit mode
9.2 years ago

The error message says exactly what's wrong. You need to add read groups to the file. You can do this with picard's AddOrReplaceReadGroups tools.

Edit: Just adding the read groups to the header isn't enough. Each alignment is required to have an associated read group.

ADD COMMENT

Login before adding your answer.

Traffic: 2130 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6