Genotype
1
0
Entering edit mode
2.8 years ago

What is the correct way of setting the genotype after splitting multi-allelic sites in a VCF file?

genetic • 561 views
ADD COMMENT
0
Entering edit mode

Please search the forum before posting a new question. This question is an exact duplicate (at least the title matches, you put in near zero effort in your post) of What is the correct way of setting the genotype after splitting multi-allelic sites in a VCF file?

ADD REPLY
0
Entering edit mode
2.8 years ago

Typically, multi-allelic calls are split into separate records and then any indels are left-aligned. You may also wish to reset the ID field, and/or check that each base in your REF column is consistent with the reference genome.

I elaborate on this in Solution 1, here: Remove duplicate SNPs only based on SNP ID in bcftools

That is:

bcftools norm -m-any myfile.vcf.gz | \
  bcftools norm --check-ref w -f human_g1k_v37.fasta -Ob > out.bcf ;
bcftools index out.bcf ;
  • -m-any splits any multi-allele calls
  • bcftools norm in conjunction with -f human_g1k_v37.fasta will left-align indels
  • --check-ref w should result in each base in your VCF's REF column being checked against the supplied FASTA file, with a warning issued if any inconsistency identified

Regarding the FASTA, please use the same FASTA as that used for the original alignment.

To reset the ID field to, e.g. CHROM:POS:REF:ALT, please do:

bcftools annotate -Ob -x 'ID' -I +'CHROM:%POS:%REF:%ALT'

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2782 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6