I am new to the field, I am trying to liftover my genotyping data for a GWAS from hg38 to hg19. When using PicardLiftover VCF or CrossMap on my SNPs prior to QC, I am losing ~15% of my total SNPs during liftover mainly due to "mismatching reference alleles". The rate goes higher after basic QC of SNPs (listed below). Any advice on if the observed liftover rate is acceptable ( and if not where would be a good place to start troubleshooting) is very much appreciated. Further details are below, if further information is needed please let me know. Many thanks beforehand for your time and advices.
Infinium Global Screening Array to obtain around 600K human variants.
% Variants lost during liftover
When using PicardLiftover VCF on the VCF of our data without QC, I lose ~16%of my SNPs:
- 13% are "variants lifted over but had mismatching reference alleles after lift over."
- 3% are "variants failed to liftover"
Liftover rate after basic QC:
If I apply liftover after some basic QC (listed below) the failure rate still remains high; all dominantly due to " mismatching reference":
1. Post Missingness filter of 0.02 for SNPs and Samples : 15% lost of ~600K variants
2. Post MAF (0.05) and autosomal SNP filter: 27% lost of ~250K variants
The options used for Picard:
java -jar ./picard.jar CreateSequenceDictionary REFERENCE=./hg19.fa OUTPUT=./hg19.dict
java -jar LiftoverVcf -I ./Myhg38.vcf -O ./Myhg19.vcf -CHAIN ./hg38ToHg19.over.chain.gz -REJECT ./rejected.vcf -R ./hg19.fa