Question: Error with MuTect: SAM/BAM file SAMFileReader is malformed
0
gravatar for eem0306
4.1 years ago by
eem03060
Korea, Republic Of
eem03060 wrote:

I tried to run MuTect with following command line;

/usr/lib/jvm/java-1.6.0/bin/java -Xmx2g -jar /home/exman/muTect-1.1.4.jar 
--analysis_type MuTect 
--reference_sequence /data/eem0306/ref/1.fa 
--cosmic /data/eem0306/ref/b37_cosmic_v54_120711.vcf 
--dbsnp /data/eem0306/ref/dbsnp_138.hg19.nochr.vcf 
--input_file:normal /data/eem0306/somatic.caller/sample/N.rmdup.realigned.BQSR.h.bam 
--input_file:tumor /data/eem0306/somatic.caller/using.mutect/myAnalysis20/N20.result.sorted.h.qsort.bwasw.h.filter.sorted.bam 
--out /data/eem0306/somatic.caller/using.mutect/2nd.myAnalysis20/n20.results 
--coverage_file n20.coverage.wig.txt


but I got error messages several times, like this


##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.2-25-g2a68eab):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/data/eem0306/somatic.caller/using.mutect/myAnalysis20/N20.result.sorted.h.qsort.bwasw.h.filter.sorted.bam} is malformed: Read D0ENMACXX111207:7:1202:2132:140703 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK.  Please use http://www.broadinstitute.org/gsa/wiki/index.php/ReplaceReadGroups to fix this problem
##### ERROR ------------------------------------------------------------------------------------------


the headers in my file is

@HD     VN:1.4  GO:none SO:coordinate
@SQ     SN:1    LN:249250621    UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   AS:GRCh37       M5:1b22b98cdeb4a9304cb5d48026a85128     SP:Homo Sapiens
@RG     ID:C09DF.1      PL:illumina     PU:C09DFACXX111207.1.TTGAGCCT   LB:Solexa-76163 DT:2011-12-07T14:00:00+0900     SM:HCC1143 BL   CN:BI
@RG     ID:C09DF.2      PL:illumina     PU:C09DFACXX111207.2.TTGAGCCT   LB:Solexa-76163 DT:2011-12-07T14:00:00+0900     SM:HCC1143 BL   CN:BI
@RG     ID:D0EN0.4      PL:illumina     PU:D0EN0ACXX111207.4.TTGAGCCT   LB:Solexa-76163 DT:2011-12-07T14:00:00+0900     SM:HCC1143 BL   CN:BI
@RG     ID:D0EN0.7      PL:illumina     PU:D0EN0ACXX111207.7.TTGAGCCT   LB:Solexa-76163 DT:2011-12-07T14:00:00+0900     SM:HCC1143 BL   CN:BI
@RG     ID:D0EN0.8      PL:illumina     PU:D0EN0ACXX111207.8.TTGAGCCT   LB:Solexa-76163 DT:2011-12-07T14:00:00+0900     SM:HCC1143 BL   CN:BI
@RG     ID:D0ENM.1      PL:illumina     PU:D0ENMACXX111207.1.CCAGTTAG   LB:Sage-75643   DT:2011-12-07T14:00:00+0900     SM:HCC1143      CN:BI
@RG     ID:D0ENM.2      PL:illumina     PU:D0ENMACXX111207.2.CCAGTTAG   LB:Sage-75643   DT:2011-12-07T14:00:00+0900     SM:HCC1143      CN:BI
@RG     ID:D0ENM.3      PL:illumina     PU:D0ENMACXX111207.3.CCAGTTAG   LB:Sage-75643   DT:2011-12-07T14:00:00+0900     SM:HCC1143      CN:BI
@RG     ID:D0ENM.5      PL:illumina     PU:D0ENMACXX111207.5.CCAGTTAG   LB:Sage-75643   DT:2011-12-07T14:00:00+0900     SM:HCC1143      CN:BI
@RG     ID:D0ENM.6      PL:illumina     PU:D0ENMACXX111207.6.CCAGTTAG   LB:Sage-75643   DT:2011-12-07T14:00:00+0900     SM:HCC1143      CN:BI
@RG     ID:D0ENM.7      PL:illumina     PU:D0ENMACXX111207.7.CCAGTTAG   LB:Sage-75643   DT:2011-12-07T14:00:00+0900     SM:HCC1143      CN:BI
@PG     ID:GATK IndelRealigner  CL:knownAlleles=[(RodBinding name=knownAlleles source=/bio/lib/ref/1000G_phase1.indels.b37.vcf), (RodBinding name=knownAlleles2 source=/bio/lib/ref/Mills_and_1000G_gold_standard.indels.b37.vcf)] targetIntervals=n20t80.rmdup_intervals.list LODThresholdForCleaning=5.0 consensusDeterminationModel=USE_READS entropyThreshold=0.15 maxReadsInMemory=150000 maxIsizeForMovement=3000 maxPositionalMoveAllowed=200 maxConsensuses=30 maxReadsForConsensuses=120 maxReadsForRealignment=20000 noOriginalAlignmentTags=false nWayOut=null generate_nWayOut_md5s=false check_early=false noPGTag=false keepPGTags=false indelsFileForDebugging=null statisticsFileForDebugging=null SNPsFileForDebugging=null
@PG     ID:GATK TableRecalibration      VN:1.3-14-g59da26a      CL:default_read_group=null default_platform=null force_read_group=null force_platform=null window_size_nqs=5 homopolymer_nback=7 exception_if_no_tile=false solid_recal_mode=SET_Q_ZERO solid_nocall_strategy=THROW_EXCEPTION recal_file=/seq/picard/D0ENMACXX/C1-210_2011-12-07_2011-12-18/1/Sage-75643/D0ENMACXX.1.recal_data.csv preserve_qscores_less_than=5 smoothing=1 max_quality_score=50 doNotWriteOriginalQuals=false no_pg_tag=false fail_with_no_eof_marker=false skipUQUpdate=false Covariates=[ReadGroupCovariate, QualityScoreCovariate, CycleCovariate, DinucCovariate]
@PG     ID:GATK TableRecalibration.1    VN:1.3-14-g59da26a      CL:default_read_group=null default_platform=null force_read_group=null force_platform=null window_size_nqs=5 homopolymer_nback=7 exception_if_no_tile=false solid_recal_mode=SET_Q_ZERO solid_nocall_strategy=THROW_EXCEPTION recal_file=/seq/picard/D0ENMACXX/C1-210_2011-12-07_2011-12-18/3/Sage-75643/D0ENMACXX.3.recal_data.csv preserve_qscores_less_than=5 smoothing=1 max_quality_score=50 doNotWriteOriginalQuals=false no_pg_tag=false fail_with_no_eof_marker=false skipUQUpdate=false Covariates=[ReadGroupCovariate, QualityScoreCovariate, CycleCovariate, DinucCovariate]
@PG     ID:GATK TableRecalibration.2    VN:1.3-14-g59da26a      CL:default_read_group=null default_platform=null force_read_group=null force_platform=null window_size_nqs=5 homopolymer_nback=7 exception_if_no_tile=false solid_recal_mode=SET_Q_ZERO solid_nocall_strategy=THROW_EXCEPTION recal_file=/seq/picard/D0ENMACXX/C1-210_2011-12-07_2011-12-18/7/Sage-75643/D0ENMACXX.7.recal_data.csv preserve_qscores_less_than=5 smoothing=1 max_quality_score=50 doNotWriteOriginalQuals=false no_pg_tag=false fail_with_no_eof_marker=false skipUQUpdate=false Covariates=[ReadGroupCovariate, QualityScoreCovariate, CycleCovariate, DinucCovariate]
@PG     ID:GATK TableRecalibration.3    VN:1.3-14-g59da26a      CL:default_read_group=null default_platform=null force_read_group=null force_platform=null window_size_nqs=5 homopolymer_nback=7 exception_if_no_tile=false solid_recal_mode=SET_Q_ZERO solid_nocall_strategy=THROW_EXCEPTION recal_file=/seq/picard/D0ENMACXX/C1-210_2011-12-07_2011-12-18/5/Sage-75643/D0ENMACXX.5.recal_data.csv preserve_qscores_less_than=5 smoothing=1 max_quality_score=50 doNotWriteOriginalQuals=false no_pg_tag=false fail_with_no_eof_marker=false skipUQUpdate=false Covariates=[ReadGroupCovariate, QualityScoreCovariate, CycleCovariate, DinucCovariate]
@PG     ID:GATK TableRecalibration.4    VN:1.3-14-g59da26a      CL:default_read_group=null default_platform=null force_read_group=null force_platform=null window_size_nqs=5 homopolymer_nback=7 exception_if_no_tile=false solid_recal_mode=SET_Q_ZERO solid_nocall_strategy=THROW_EXCEPTION recal_file=/seq/picard/D0ENMACXX/C1-210_2011-12-07_2011-12-18/6/Sage-75643/D0ENMACXX.6.recal_data.csv preserve_qscores_less_than=5 smoothing=1 max_quality_score=50 doNotWriteOriginalQuals=false no_pg_tag=false fail_with_no_eof_marker=false skipUQUpdate=false Covariates=[ReadGroupCovariate, QualityScoreCovariate, CycleCovariate, DinucCovariate]
@PG     ID:GATK TableRecalibration.5    VN:1.3-14-g59da26a      CL:default_read_group=null default_platform=null force_read_group=null force_platform=null window_size_nqs=5 homopolymer_nback=7 exception_if_no_tile=false solid_recal_mode=SET_Q_ZERO solid_nocall_strategy=THROW_EXCEPTION recal_file=/seq/picard/D0ENMACXX/C1-210_2011-12-07_2011-12-18/2/Sage-75643/D0ENMACXX.2.recal_data.csv preserve_qscores_less_than=5 smoothing=1 max_quality_score=50 doNotWriteOriginalQuals=false no_pg_tag=false fail_with_no_eof_marker=false skipUQUpdate=false Covariates=[ReadGroupCovariate, QualityScoreCovariate, CycleCovariate, DinucCovariate]
@PG     ID:bwa  PN:bwa  VN:0.5.9-r16    CL:bwa aln Homo_sapiens_assembly19.fasta -q 5 -l 32 -k 2 -t 4 -o 1 -f D0ENMACXX.1.Sage-75643.Homo_sapiens_assembly19.2.sai D0ENMACXX.1.Sage-75643.2.fastq.gz; bwa aln Homo_sapiens_assembly19.fasta -q 5 -l 32 -k 2 -t 4 -o 1 -f D0ENMACXX.1.Sage-75643.Homo_sapiens_assembly19.1.sai D0ENMACXX.1.Sage-75643.1.fastq.gz; bwa sampe -P -f D0ENMACXX.1.Sage-75643.Homo_sapiens_assembly19.aligned_bwa.sam Homo_sapiens_assembly19.fasta D0ENMACXX.1.Sage-75643.Homo_sapiens_assembly19.1.sai D0ENMACXX.1.Sage-75643.Homo_sapiens_assembly19.2.sai D0ENMACXX.1.Sage-75643.1.fastq.gz D0ENMACXX.1.Sage-75643.2.fastq.gz
@PG     ID:bwa.1        PN:bwa  VN:0.5.9-r16    CL:bwa aln Homo_sapiens_assembly19.fasta -q 5 -l 32 -k 2 -t 4 -o 1 -f D0ENMACXX.3.Sage-75643.Homo_sapiens_assembly19.2.sai D0ENMACXX.3.Sage-75643.2.fastq.gz; bwa aln Homo_sapiens_assembly19.fasta -q 5 -l 32 -k 2 -t 4 -o 1 -f D0ENMACXX.3.Sage-75643.Homo_sapiens_assembly19.1.sai D0ENMACXX.3.Sage-75643.1.fastq.gz; bwa sampe -P -f D0ENMACXX.3.Sage-75643.Homo_sapiens_assembly19.aligned_bwa.sam Homo_sapiens_assembly19.fasta D0ENMACXX.3.Sage-75643.Homo_sapiens_assembly19.1.sai D0ENMACXX.3.Sage-75643.Homo_sapiens_assembly19.2.sai D0ENMACXX.3.Sage-75643.1.fastq.gz D0ENMACXX.3.Sage-75643.2.fastq.gz
@PG     ID:bwa.2        PN:bwa  VN:0.5.9-r16    CL:bwa aln Homo_sapiens_assembly19.fasta -q 5 -l 32 -k 2 -t 4 -o 1 -f D0ENMACXX.7.Sage-75643.Homo_sapiens_assembly19.2.sai D0ENMACXX.7.Sage-75643.2.fastq.gz; bwa aln Homo_sapiens_assembly19.fasta -q 5 -l 32 -k 2 -t 4 -o 1 -f D0ENMACXX.7.Sage-75643.Homo_sapiens_assembly19.1.sai D0ENMACXX.7.Sage-75643.1.fastq.gz; bwa sampe -P -f D0ENMACXX.7.Sage-75643.Homo_sapiens_assembly19.aligned_bwa.sam Homo_sapiens_assembly19.fasta D0ENMACXX.7.Sage-75643.Homo_sapiens_assembly19.1.sai D0ENMACXX.7.Sage-75643.Homo_sapiens_assembly19.2.sai D0ENMACXX.7.Sage-75643.1.fastq.gz D0ENMACXX.7.Sage-75643.2.fastq.gz
.....


and some of the reads info like this;
D0ENMACXX111207:7:1202:2132:140703      163     1       10002   20      90M11S  =       10181   274     AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCAACCCTAACCCTAACCCATACTCTACCCAGTACCCTAACCCTAACCCTTACCCTAACCC   =;?ACC?AEE.DDA7=7?CDDE
ECFEDCFC<AFE?H=+EBA,1@F7>@@70>F################################################   AS:i:58 XS:i:54 XF:i:0  XE:i:1  NM:i:8  XT:i:1
D0ENMACXX111207:6:2202:6438:59394       163     1       10003   20      11M1I84M5S      =       10183   280     ACCCTAACCCTAAACCCTNACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCGAACCCTAACCCTTACCC   =>=?BA?FCCDCB@
GBCD#AFEECEDDFFFEDGFFFDDFFF@BEFBF@@EFEEFAE@D=AAABEFCAABB*C=8=;F########################   AS:i:83 XS:i:79 XF:i:0  XE:i:1  NM:i:2  XT:i:1
D0ENMACXX111207:1:2304:12628:161494     99      1       10009   28      59M1D42M        =       10195   287     ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCTCACCCTAACCCTAACCCTAACCCTAACCCAA   @@@AFDCFEEGFDF
FFGECFEFGFCFFFGCDFFFGBAEFFGFCFFCF=DC<FE?DCFFBDEFF2?:F##################################   AS:i:90 XS:i:80 XF:i:3  XE:i:2  NM:i:2
D0ENMACXX111207:3:2302:13424:148033     163     1       10011   22      101M    =       10353   427     CCTAGCCCTAGCCCTAGCCCTAGCCCTAGCCCTAGCCCTAGCCCTAGCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAC   <@D=BBADEB@DEDGE?GFFFE
FDEEGE9EGGEE@@CEGE<GGCGED;@ADGBGCFE@EEFFGGEFEED0CD?E;AC2;D2;BFDD###############   AS:i:69 XS:i:63 XF:i:3  XE:i:2  NM:i:8
D0ENMACXX111207:5:2201:12464:124446     99      1       10027   6       29M2I70M        =       10298   366     ACCCTAACCCTAACCCTAACCCTAACCCTAAAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCCAACCC   @@=BCEEFCECCF;
F>@CEBGGCFB9FFFFB@-AF>E@6FFGCF?FFFFB@>DBF;7B9FF=FEEFGE@FF?/@8GCE<<+:?EE@@?>A?B?E#######   AS:i:86 XS:i:76 XF:i:3  XE:i:1  NM:i:3

 

 

Actually I replaced the header of  my bam file with the header of the original bam file.
The error was caused because the RG info does not accord with the RG info of the reads?

How can I solve this problems?

ADD COMMENTlink modified 4.1 years ago by Devon Ryan88k • written 4.1 years ago by eem03060
0
gravatar for Devon Ryan
4.1 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

The error message says exactly what's wrong. You need to add read groups to the file. You can do this with picard's AddOrReplaceReadGroups tools.

Edit: Just adding the read groups to the header isn't enough. Each alignment is required to have an associated read group.

ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Devon Ryan88k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 989 users visited in the last hour