how to switch REF and ALT alleles in VCF files if the REF is incorrect according to RefSeq?
2
2
Entering edit mode
7.0 years ago
miaowzai ▴ 390

I have a VCF file, and I believe it is converted from PED file by PLINK, as illustrated in this blog:

There is one comment saying ##INFO=<ID=PR,Number=0,Type=Flag,Description="Provisional reference allele, may not be based on real reference genome"> in the VCF file.

For a some variant loci, the REF and ALT had been switched in the VCF file for unknown reason. For example, it should be G at locus 1234 in RefSeq, and the variant is T. But the VCF file records T(REF) and G(ALT).

I only have the VCF file, and do not have the original PED file. Is there any tool or method to check if the REF alleles are correct by RefSeq and switch REF and ALT columns (or just remove this loci) in the VCF if they're wrong?

Thanks!

VCF REF allele ALT allele PLINK • 9.3k views
ADD COMMENT
0
Entering edit mode

I guess the solution should be like this way:

  1. check the reference allele with genome reference (hg19.fa or hg38.fa)

  2. if the reference allele is not same with the human genome, then check whether the alternative allele is same with human reference

  3. if the alternative allele is same with human genome, then switch it. For the sample gentoype, 0 -> 1 and 1 -> 0

  4. if neither reference and alternative allele is not same as reference, then .... remove it??

ADD REPLY
3
Entering edit mode
7.0 years ago

Given a file with the correct reference alleles for each variant ID, you can use plink's --a2-allele flag to fix them in the VCF.

ADD COMMENT
1
Entering edit mode

Thanks for the quick answer. I never used plink. I looked at the manual of --a2-allele in plink documentation and I don't completely understand. Could you elaborate the method? Thank you!

ADD REPLY
2
Entering edit mode

It depends on what your RefSeq file looks like. The basic structure of the command would be

plink --vcf [name of plink-exported VCF with incorrect reference alleles] --a2-allele [name of RefSeq file] [1-based column index of ref alleles] [1-based column index of variant IDs] --recode vcf --real-ref-alleles
ADD REPLY
1
Entering edit mode
4.8 years ago
Shicheng Guo ★ 9.4k

Now I get the best solution with bcftools

bcftools view -T ^remove.txt $i -Ov | bcftools norm -m-any -Ov --check-ref w -f ~/hpc/db/hg19/hg19.fa | bcftools annotate -x ID,INFO,FORMAT -I +'%CHROM:%POS' -Oz -o $i.trim.vcf.gz
ADD COMMENT
1
Entering edit mode

That's bcftools, not vcftools. They're two different tools.

ADD REPLY

Login before adding your answer.

Traffic: 2722 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6