I am trying to liftover a hg38 Whole Genome Sequenced VCF to hg19 VCF.
Planning to use GATK Picard for this.
However not sure which liftover chain file to use from this path:
hg38ToHg19.over.chain.gz which, as by the name, lifts hg38 to hg19.
The file names reflect the assembly conversion data contained within
in the format <db1>To<Db2>.over.chain.gz. For example, a file named
hg15ToHg16.over.chain.gz file contains the liftOver data needed to
convert hg15 (Human Build 33) coordinates to hg16 (Human Build 34).
But notice they are not equivalent and the UCSC chain file usually covers more bases while the Ensembl chain file includes chains for contigs with and without the chr prefix
Also notice that you will not be able to use either chain files to liftover to GRCh37 contigs without the chr prefix with Picard/LiftoverVcf, as you would need something like hg38ToB37.over.chain.gz instead. If you want to avoid worrying about contig names, you can use BCFtools/liftover available here (binaries available here) which will work seamlessly with any chain file regardless of the contigs name format
You can use the UCSC liftover chain file:
Or the Ensembl chain file:
But notice they are not equivalent and the UCSC chain file usually covers more bases while the Ensembl chain file includes chains for contigs with and without the
chr
prefixAlso notice that you will not be able to use either chain files to liftover to GRCh37 contigs without the
chr
prefix with Picard/LiftoverVcf, as you would need something likehg38ToB37.over.chain.gz
instead. If you want to avoid worrying about contig names, you can use BCFtools/liftover available here (binaries available here) which will work seamlessly with any chain file regardless of the contigs name formatGood edit @zx8754, thanks!