Entering edit mode
9 months ago
JourneyToAbyss
▴
240
I am using the 1000 genome files, provided by the plink 2 author, and using liftover to convert the positions to hg19. To do this, I am first converting to a VCF file, sorting with bcftools, and then using CrossMap to perform the liftover. Of 70,692,015 (only chr1-22 and XY included), 16,559,055 failed to map.
Is this to be expected? Or is something suspect with my pipeline?
The liftover tool should provide a log about the variant that failed the process
It did and nothing stood out with a quick look. I am curious about other people's experience with liftover and the expected loss in variants from GRCh38 to hg19.
how do you liftover ?
I used CrossMap to convert the VCF file, which I generated from plink2's 1000-genome files.
https://crossmap.sourceforge.net/#convert-vcf-format-files
As this program uses the reference genome (not the overchain), maybe this isn't appropriate. I am going to attempt using the overchain and merely update the positions.
You should definitely provide the chain file AND the reference genome to CrossMap, how else is CrossMap supposed the know how to lift the variants?
Using a version of the high coverage 1000 Genomes project callset with 63,993,411 non-singleton bi-allelic SNVs and 9,459,059 non-singleton bi-allelic together indels with the hg38ToHg19.over.chain.gz UCSC chain file, I get 916,020 SNVs and 63,685 indels dropped using CrossMap/VCF while I get 872,258 SNVs and 55,590 indels dropped using BCFtools/liftover so what you are seeing is not expected