How to Turn Multiple ALT alleles to Genotype Calls
1
1
Entering edit mode
5.8 years ago
jimkozubek ▴ 30

I have a VCF file and many lines have multiple ALT alleles such as:

1       104159  .       CA      C,TA,TTT

I have an algorithm that takes genotype data in a 0,1,2 matrix form. I am wondering if there are any standards or best practices in how to turn VCF lines with multiple ALT alleles (0/0,0/1,0/2,1/2,2/3) into 0,1,2 genotype form.

Genotype • 2.6k views
ADD COMMENT
1
Entering edit mode

Thanks for the pro tip!

ADD REPLY
1
Entering edit mode

did not work xd Can you share the solution if you have?

If I use the command below to divide them into biallelic, it does not work correctly.

Another solution is maybe to get rid of them but I do not lean towards.. Would be appreciated if you share the solution. Thanks!

ADD REPLY
1
Entering edit mode
5.8 years ago

Yes, you can split multi-allelic calls with

bcftools norm -Ov -m-any MyVariants.vcf > MyVariantsSplit.vcf ;

A useful addition is to also set the variants such that the REF allele matches that of a chosen reference genome, for example:

bcftools norm -Ov -m-any MyVariants.vcf | bcftools norm -Ov -f human_g1k_v37.fasta > MyVariantsSplitRefChecked.vcf ;

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2058 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6