Question

Forum:Solution: Gatk Doesn't Recognize Karyotypic Ordering

5

Entering edit mode

11.5 years ago

Charles Warden 8.3k

This is a problem that bugged me for a few days. Figure the solution might be helpful to others:

I was seeing the following error message when I tried to run GATK:

##### ERROR MESSAGE: Lexicographically sorted human genome sequence detected in reference.
##### ERROR For safety's sake the GATK requires human contigs in karyotypic order: 1, 2, ..., 10, 11, ..., 20, 21, 22, X, Y with M either leading or trailing these contigs.
##### ERROR This is because all distributed GATK resources are sorted in karyotypic order, and your processing will fail when you need to use these files.
##### ERROR You can use the ReorderSam utility to fix this problem: http://gatkforums.broadinstitute.org/discussion/58/companion-utilities-reordersam
##### ERROR   reference contigs = [chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr1, chr20, chr21, chr22, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chrM, chrX, chrY]

I understood that I should be able fix this problem using Picard ReorderSam.jar, but the problem was that the .bam file seemed properly ordered yet GATK didn't recognize this to be the case.

Here are the commands that I used for pre-processing:

bwa mem $reference $read1 $read2 > $sam
samtools view -bS $sam > $bam

#remove unaligned reads
samtools view -F 0x04 -b $bam > $filtered_bam

#sort sample
java -jar /opt/picard-tools-1.105/SortSam.jar I=$filtered_bam O=$sorted_bam SORT_ORDER=coordinate CREATE_INDEX=True

#MarkDuplicates
 java -jar /opt/picard-tools-1.105/MarkDuplicates.jar INPUT=$sorted_bam OUTPUT=$nodup_bam METRICS_FILE=$metrics_file REMOVE_DUPLICATES=true

#add read groups
java -jar /opt/picard-tools-1.105/AddOrReplaceReadGroups.jar INPUT=$nodup_bam OUTPUT=$rg_bam RGLB=1 RGPL=illumina RGPU=barcode RGSM=test CREATE_INDEX=True 

#reorder sample
java -jar /opt/picard-tools-1.105/ReorderSam.jar" I=$rg_bam O=$karyotype_bam REFERENCE=$karotypic_fasta CREATE_INDEX=True

#start running GATK
java -jar /opt/GenomeAnalysisTK-2.8-1-g932cd3a/GenomeAnalysisTK.jar -T RealignerTargetCreator -R $reference -I $karyotype_bam -o $target_intervals

I converted the final .bam file to a .sam file, and I viewed the beginning of the file to confirm that the chromosomes are in the correct order in the header and I also checked the first few reads. Everything seemed to be OK.

Also, I originally ran bwa the old-fashioned way (with bwa aln followed by bwa sampe). I then ran bwa-mem as part of the troubleshooting process. The error message stayed the same either way.

Solution: I was trying to use the Karyotype reference to reorder an alignment from a normal hg19 reference. Running the pipeline from the beginning with only the Karyotype reference provided by GATK solved the problem

#In other words, $karotypic_fasta should be used instead of $reference, so the BWA command should be as follows
bwa index -a bwtsw $karotypic_fasta
bwa mem $karotypic_fasta $read1 $read2 > $sam

picard gatk hg19 • 6.3k views

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 11.5 years ago by Charles Warden 8.3k

Ram · Answer 1 · 2015-03-26

0

Entering edit mode

10.3 years ago

roysomak4 ▴ 40

Hi Charles,

Thank you for sharing your experience. I am facing a same problem with the lexicographically sorted data. I am relatively new to using bioinformatics tools, therefore please pardon my ignorance!

I have a set of BAM files from an exome sequencing run on a SOLiD 5500. The XSEQ files were aligned using LifeScope to generate the BAM files.

You approach makes logical sense and will give it a try on my machine.

My question: Is it necessary to run ReorderSam if the BAM files are realigned using the karyotypic reference with bwa?

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by roysomak4 ▴ 40

0

Entering edit mode

To be honest, it has been a while since I ran into this problem. If I didn't need the reorder step, then I probably wouldn't have listed it.

More specifically, the original BWA alignment won't be sorted (which is why you have to use samtools to sort the file). However, samtools sorts by coordinate and not karyotype ordering, so I think this is why you need to run the ReorderSam step.

However, feel free to see what happens if you skip that step - I'm sure others would like to know if there is a more efficient solution!

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by Charles Warden 8.3k