I have been trying to use Mutect to compare results from Varscan and other tools. To run MuTect, pre-processing from GATK and Picard tools is necessary.
1. Mapped reads using BWA.
2. Convert to sorted BAM using PICARD
java -Xmx4g \ -Djava.io.tmpdir=/tmp \ -jar SortSam.jar \ SO=coordinate \ INPUT=Trimmed_ERR361938_trimmed_bwa.sam \ OUTPUT=Test.bam \ VALIDATION_STRINGENCY=LENIENT \ CREATE_INDEX=true
3. Mark Duplicates using PICARD
java -Xmx4g \ -Djava.io.tmpdir=/tmp \ -jarpicard-tools-1.119/SortSam.jar \ SO=coordinate \ INPUT=Trimmed_ERR361938_trimmed_bwa.sam \ OUTPUT=Test.bam \ VALIDATION_STRINGENCY=LENIENT \ CREATE_INDEX=true
4. Realign along INDEL using GATK
java -Xmx4g \ -jar GenomeAnalysisTK.jar \ -T RealignerTargetCreator \ -R /steno-internal/chirag/data/indexGenome/hg19/bwa/hg19.fa \ -o input.bam.list \ -I input.marked.bam
NOW I GET ERROR
##### ERROR ##### ERROR MESSAGE: SAM/BAM file input.marked.bam is malformed: SAM file doesn't have any read groups defined in the header. The GATK no longer supports SAM files without read groups ##### ERROR
There is this script which should fix this, but I am not sure of some of the parameter used here,
java -jar ~/unixTools/picard-tools-1.119/AddOrReplaceReadGroups.jar
These parameters need to be used
- LB=String Read Group Library Required.
- PU=String Read Group platform unit (eg. run barcode) Required.
- SM=String Read Group sample name Required.
How do I get information on these parameters, as I am analyzing many published reads.
Are there some other ways to fix this step.
Thanks in advance!
From my experience, you'll face three obstacles in succession as you embark on this journey (between steps 3 and 4 from your question). Possible tools you might need as you go:
I am following these steps. Number-3 is fine, have not yet encountered step-2. Step-1 is where i am having problem.
FYI (In case you have not seen it before),
PS: Huh , just realized that its an old post :-)