This is a problem that bugged me for a few days. Figure the solution might be helpful to others:
I was seeing the following error message when I tried to run GATK:
##### ERROR MESSAGE: Lexicographically sorted human genome sequence detected in reference.
##### ERROR For safety's sake the GATK requires human contigs in karyotypic order: 1, 2, ..., 10, 11, ..., 20, 21, 22, X, Y with M either leading or trailing these contigs.
##### ERROR This is because all distributed GATK resources are sorted in karyotypic order, and your processing will fail when you need to use these files.
##### ERROR You can use the ReorderSam utility to fix this problem: http://gatkforums.broadinstitute.org/discussion/58/companion-utilities-reordersam
##### ERROR reference contigs = [chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr1, chr20, chr21, chr22, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chrM, chrX, chrY]
I understood that I should be able fix this problem using Picard ReorderSam.jar, but the problem was that the .bam file seemed properly ordered yet GATK didn't recognize this to be the case.
Here are the commands that I used for pre-processing:
bwa mem $reference $read1 $read2 > $sam
samtools view -bS $sam > $bam
#remove unaligned reads
samtools view -F 0x04 -b $bam > $filtered_bam";
#sort sample
java -jar /opt/picard-tools-1.105/SortSam.jar I=$filtered_bam O=$sorted_bam SORT_ORDER=coordinate CREATE_INDEX=True
#MarkDuplicates
java -jar /opt/picard-tools-1.105/MarkDuplicates.jar INPUT=$sorted_bam OUTPUT=$nodup_bam METRICS_FILE=$metrics_file REMOVE_DUPLICATES=true
#add read groups
java -jar /opt/picard-tools-1.105/AddOrReplaceReadGroups.jar INPUT=$nodup_bam OUTPUT=$rg_bam RGLB=1 RGPL=illumina RGPU=barcode RGSM=test CREATE_INDEX=True
#reorder sample
java -jar /opt/picard-tools-1.105/ReorderSam.jar" I=$rg_bam O=$karyotype_bam REFERENCE=$karotypic_fasta CREATE_INDEX=True
#start running GATK
java -jar /opt/GenomeAnalysisTK-2.8-1-g932cd3a/GenomeAnalysisTK.jar -T RealignerTargetCreator -R $reference -I $karyotype_bam -o $target_intervals
I converted the final .bam file to a .sam file, and I viewed the beginning of the file to confirm that the chromosomes are in the correct order in the header and I also checked the first few reads. Everything seemed to be OK.
Also, I originally ran BWA the old-fashioned way (with bwa aln followed by bwa sampe). I then ran BWA-MEM as part of the troubleshooting process. The error message stayed the same either way.
Solution: I was trying to use the Karyotype reference to reorder an alignment from a normal hg19 reference. Running the pipeline from the beginning with only the Karyotype reference provided by GATK solved the problem
#In other words, $karotypic_fasta should be used instead of $reference, so the BWA command should be as follows
bwa index -a bwtsw $karotypic_fasta
bwa mem $karotypic_fasta $read1 $read2 > $sam