Question: how to switch REF and ALT alleles in VCF files if the REF is incorrect according to RefSeq?
2.1 years ago
United States
miaowzai130 wrote:

I have a VCF file, and I believe it is converted from PED file by PLINK, as illustrated in this blog:

There is one comment saying ##INFO=<ID=PR,Number=0,Type=Flag,Description="Provisional reference allele, may not be based on real reference genome"> in the VCF file.

For a some variant loci, the REF and ALT had been switched in the VCF file for unknown reason. For example, it should be G at locus 1234 in RefSeq, and the variant is T. But the VCF file records T(REF) and G(ALT).

I only have the VCF file, and do not have the original PED file. Is there any tool or method to check if the REF alleles are correct by RefSeq and switch REF and ALT columns (or just remove this loci) in the VCF if they're wrong?


2.1 years ago by
United States
chrchang523 wrote:

Given a file with the correct reference alleles for each variant ID, you can use plink's --a2-allele flag to fix them in the VCF.

Thanks for the quick answer. I never used plink. I looked at the manual of --a2-allele in plink documentation and I don't completely understand. Could you elaborate the method? Thank you!

It depends on what your RefSeq file looks like. The basic structure of the command would be

plink --vcf [name of plink-exported VCF with incorrect reference alleles] --a2-allele [name of RefSeq file] [1-based column index of ref alleles] [1-based column index of variant IDs] --recode vcf --real-ref-alleles
