Easy Way To Change Reference Alleles In Large Vcf File?
2
0
Entering edit mode
13.0 years ago
Kking • 0

Before I begin coding, does a tool already exist that allows you to easily switch reference alleles in a large VCF file (~400K variants) based on a reference genome, re-encoding all the genotypes properly?

We have a large amount of legacy data in PLINK format that we would like to use with some of the modules in GATK's VariantEval method to compare with whole exome data. I tried converting the PLINK data to VCF using PLINK v1.08. However, it does not have a mechanism for specifying the reference allele, and the output did not match our sequencing files.

vcf plink reference • 5.7k views
ADD COMMENT
0
Entering edit mode
12.7 years ago

I have the same problem to convert PLINK files to VCF files. I loaded the PLINK files into a project using PLINK/SEQ and then output them to VCF files using the command write-vcf. However, the resulting VCF files have mismatched alleles with REFDB. Does anyone know how to solve this problem? Thanks.

ADD COMMENT
0
Entering edit mode

This is not an answer. Please add this as a comment to the original question. Thank you.

ADD REPLY
0
Entering edit mode
12.3 years ago

May be you could do something like this:

grep "^#' yourFile.vcf > newvcffile.vcf

grep -v "^#" yourFile.vcf | awk '{printf $1, $3, $2, $4... }' >> newvcffile.vcf

ADD COMMENT
0
Entering edit mode

Can you help explain more what you mean by ^# in grep? how does this help changing the reference allele coding? Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 2223 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6