Question

Phased And Unphased Genotypes In Vcf Files: Does The Order Of Alleles Matter?

27

Entering edit mode

13.2 years ago

Chronos ▴ 610

As this page explains, phased genotypes are alleles-order-sensitive.

I assume that the order of alleles in VCF phased genotypes (like 0|1 and 1|0) is important as well, but I failed to find any confirmation of that in the format description.

Or is the order-sensitive alleles listing such a common thing that it doesn't need explicit description? (I'm new to the field.)

vcf genotyping • 30k views

ADD COMMENT • link updated 11.0 years ago by sliders ▴ 80 • written 13.2 years ago by Chronos ▴ 610

Ram · Answer 1 · 2011-04-30

14

Entering edit mode

13.0 years ago

Jorge Amigo 14k

the phase status of an allele takes into account in which chromosome pair has been found. as far as I know, the main reason to use allele phasing information is to increase the correctness of the haplotypes and haplotype blocks inferred from them. it makes sense to name all allele pairs sorted in the same way once you know which allele pair is on which chromosome pair, because if you have all this information sorted you'll be able to easily build haplotypes by dealing sequentialy first with first allele bases only and then with second allele bases only.

trying to be a little more visual (and simplistic too, so please all basic geneticists accept my apologizes in advance), take the table from the webpage you've mentioned:

IND, id1, id2, id3, id4, id5
rs1, AT, TT, ??, AT, AA
rs2, GC, CC, GG, CC, CG
rs3, CC, ??, ??, CG, GG
rs4, AC, CC, AA, AC, AA

if you look to individual 1 (id1) you will have 2 different haplotypes: AGCA (from first chromosome pair) and TCCC (from second chromosome pair). this information wouldn't be known if genotypes were unphased, in which case other haplotyping algorithm should be applied.

ADD COMMENT • link updated 4.8 years ago by Ram 43k • written 13.0 years ago by Jorge Amigo 14k

3

Entering edit mode

I know this post is "old" but it was helpful for me as a springboard to go into more finding on the subject. If it was helpful to me now, it definitely will be helpful to others "tomorrow". Below is an excerpt (copy-paste) from The Variant Call Format and VCFtools - Danecek et al (2011) :

GT, genotype, encodes alleles as numbers: 0 for the reference allele, 1 for the first allele listed in ALT column, 2 for the second allele listed in ALT and so on. The number of alleles suggests ploidy of the sample and the separator indicates whether the alleles are phased (”|”) or unphased (”/”) with respect to other data lines (Figure 1).

ADD REPLY • link 11.1 years ago by napoleonbesong ▴ 30

1

Entering edit mode

This was useful but still left the meaning of the order of the alleles ambiguous for me - i.e. which alleles are in the same chromosome/phase. A look at Fig 1 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3137218/figure/F1/) in the original paper you referenced confirms that for phased data alleles from different variants that are in the same position in the GT field are on the same chromosome (provided they are in the same phase set which is implied if no PS field is present).

ADD REPLY • link 11.0 years ago by sliders ▴ 80

0

Entering edit mode

So when VCF has 0|1 or 1|0, then it is safe to assume that first column (before |) always represents one haplotype, and second column (after |) always represents another haplotype?

ADD REPLY • link 13.0 years ago by Chronos ▴ 610

0

Entering edit mode

more or less. you will be able to build a haplotype with the alleles on the first column, and another one with the alleles on the second column.

ADD REPLY • link 13.0 years ago by Jorge Amigo 14k

0

Entering edit mode

Thanks, this is what I wanted to be sure of.

ADD REPLY • link 13.0 years ago by Chronos ▴ 610

score 7 · Answer 2 · 2013-05-02

Fig 1 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3137218/figure/F1/) in the original VCF/VCFtools paper referenced above confirms that for phased data alleles from different variants that are in the same position in the GT field are on the same chromosome (provided they are in the same phase set which is implied if no PS field is present).