Question: How to Turn Multiple ALT alleles to Genotype Calls
gravatar for jimkozubek
17 months ago by
jimkozubek20 wrote:

I have a VCF file and many lines have multiple ALT alleles such as:

1       104159  .       CA      C,TA,TTT

I have an algorithm that takes genotype data in a 0,1,2 matrix form. I am wondering if there are any standards or best practices in how to turn VCF lines with multiple ALT alleles (0/0,0/1,0/2,1/2,2/3) into 0,1,2 genotype form.

genotype • 705 views
ADD COMMENTlink modified 17 months ago • written 17 months ago by jimkozubek20

Thanks for the pro tip!

ADD REPLYlink written 17 months ago by jimkozubek20
gravatar for Kevin Blighe
17 months ago by
Kevin Blighe51k
Kevin Blighe51k wrote:

Yes, you can split multi-allelic calls with

bcftools norm -Ov -m-any MyVariants.vcf > MyVariantsSplit.vcf ;

A useful addition is to also set the variants such that the REF allele matches that of a chosen reference genome, for example:

bcftools norm -Ov -m-any MyVariants.vcf | bcftools norm -Ov -f human_g1k_v37.fasta > MyVariantsSplitRefChecked.vcf ;


ADD COMMENTlink written 17 months ago by Kevin Blighe51k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1405 users visited in the last hour