How to format "I" and "D" in vcf version 4.2 for liftover analysis in GATK
Entering edit mode
5 weeks ago

Hello everyone

I am facing challenges with liftover of a VCF file from hg19 to hg38 using GATK because of 'I' and 'D' annotations representing insertions and deletions in the VCF file.

Running command used for the liftover

gatk LiftoverVcf -I SNP_GRCh37.vcf  -O Liftover_with_Indels/lifted_over.vcf -C hg19ToHg38.over.chain.gz -WMC true  -R genome.fa --REJECT Liftover_with_Indels/rejeceted_variants.vcf --RECOVER_SWAPPED_REF_ALT True

Despite converted the VCF file to VCF 4.2 version using vcftools, I'm still having this issue.

htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 200: Insertions/Deletions are not supported when reading 3.x VCF's. Please convert your file to VCF4 using 
VCFTools, available at, for input source: file:///SNP_GRCh37.vcf
    at htsjdk.variant.vcf.AbstractVCFCodec.generateException(
    at htsjdk.variant.vcf.AbstractVCFCodec.checkAllele(
    at htsjdk.variant.vcf.AbstractVCFCodec.parseAlleles(
    at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(
    at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(
    at htsjdk.variant.vcf.AbstractVCFCodec.decode(
    at htsjdk.variant.vcf.AbstractVCFCodec.decode(
    at htsjdk.tribble.AsciiFeatureCodec.decode(
    at htsjdk.tribble.AsciiFeatureCodec.decode(
    at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.readNextRecord(
    at htsjdk.tribble.TribbleIndexedFeatureReader$
    at htsjdk.tribble.TribbleIndexedFeatureReader$
    at picard.vcf.LiftoverVcf.doWork(
    at picard.cmdline.CommandLineProgram.instanceMain(
    at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(
    at org.broadinstitute.hellbender.Main.mainEntry(
    at org.broadinstitute.hellbender.Main.main(

Any suggestions on how to convert 'I' and 'D' annotations into a more acceptable format compatible with VCF 4.2 would be greatly appreciated. I've been struggling with this problem for a few days now."

gatk vcf liftover • 392 views
Entering edit mode

I think this is useless, in the gatk doc:

For each variant, the tool will look for the target coordinate, reverse-complement and left-align the variant if needed, and, in the case that the reference and alternate alleles of a SNP have been swapped in the new genome build, it will adjust the SNP, and correct AF-like INFO fields and the relevant genotypes.

so, as far as I understand, the alleles must be ATGC. Unless you find a way to restore the REF and ALT sequences you'd better re-call the bam with modern tools.

Entering edit mode

Pierre Lindenbaum is correct. Also, do notice that the GATK option --RECOVER_SWAPPED_REF_ALT True does not work with indels. In general, if your VCF includes indels, avoid tools such as GATK/LiftoverVcf or CrossMap/VCF, as explained here

Entering edit mode

Don't forget to follow up on your threads. If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work. If an answer was not really helpful or did not work, provide detailed feedback so others know not to use that answer.



Login before adding your answer.

Traffic: 1957 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6