Question: Easy Way To Change Reference Alleles In Large Vcf File?
gravatar for Kking
9.4 years ago by
Kking0 wrote:

Before I begin coding, does a tool already exist that allows you to easily switch reference alleles in a large VCF file (~400K variants) based on a reference genome, re-encoding all the genotypes properly?

We have a large amount of legacy data in PLINK format that we would like to use with some of the modules in GATK's VariantEval method to compare with whole exome data. I tried converting the PLINK data to VCF using PLINK v1.08. However, it does not have a mechanism for specifying the reference allele, and the output did not match our sequencing files.

vcf reference plink • 4.5k views
ADD COMMENTlink written 9.4 years ago by Kking0
gravatar for Baojian Fan
9.1 years ago by
Baojian Fan0 wrote:

I have the same problem to convert PLINK files to VCF files. I loaded the PLINK files into a project using PLINK/SEQ and then output them to VCF files using the command write-vcf. However, the resulting VCF files have mismatched alleles with REFDB. Does anyone know how to solve this problem? Thanks.

ADD COMMENTlink written 9.1 years ago by Baojian Fan0

This is not an answer. Please add this as a comment to the original question. Thank you.

ADD REPLYlink written 9.0 years ago by lh332k
gravatar for Raony Guimaraes
8.8 years ago by
Raony Guimaraes100 wrote:

May be you could do something like this:

grep "^#' yourFile.vcf > newvcffile.vcf

grep -v "^#" yourFile.vcf | awk '{printf $1, $3, $2, $4... }' >> newvcffile.vcf

ADD COMMENTlink written 8.8 years ago by Raony Guimaraes100

Can you help explain more what you mean by ^# in grep? how does this help changing the reference allele coding? Thank you.

ADD REPLYlink written 8.5 years ago by Hypotheses90
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1116 users visited in the last hour