Question: Phased And Unphased Genotypes In Vcf Files: Does The Order Of Alleles Matter?
27
gravatar for Chronos
9.7 years ago by
Chronos600
Germany
Chronos600 wrote:

As this page explains, phased genotypes are alleles-order-sensitive.

I assume that the order of alleles in VCF phased genotypes (like 0|1 and 1|0) is important as well, but I failed to find any confirmation of that in the format description.

Or is the order-sensitive alleles listing such a common thing that it doesn't need explicit description? (I'm new to the field.)

vcf genotyping • 20k views
ADD COMMENTlink modified 7.5 years ago by sliders70 • written 9.7 years ago by Chronos600
11
gravatar for Jorge Amigo
9.5 years ago by
Jorge Amigo12k
Santiago de Compostela, Spain
Jorge Amigo12k wrote:

the phase status of an allele takes into account in which chromosome pair has been found. as far as I know, the main reason to use allele phasing information is to increase the correctness of the haplotypes and haplotype blocks inferred from them. it makes sense to name all allele pairs sorted in the same way once you know which allele pair is on which chromosome pair, because if you have all this information sorted you'll be able to easily build haplotypes by dealing sequentialy first with first allele bases only and then with second allele bases only.

trying to be a little more visual (and simplistic too, so please all basic geneticists accept my apologizes in advance), take the table from the webpage you've mentioned:

IND, id1, id2, id3, id4, id5
rs1, AT, TT, ??, AT, AA
rs2, GC, CC, GG, CC, CG
rs3, CC, ??, ??, CG, GG
rs4, AC, CC, AA, AC, AA

if you look to individual 1 (id1) you will have 2 different haplotypes: AGCA (from first chromosome pair) and TCCC (from second chromosome pair). this information wouldn't be known if genotypes were unphased, in which case other haplotyping algorithm should be applied.

ADD COMMENTlink modified 16 months ago by RamRS30k • written 9.5 years ago by Jorge Amigo12k
2

I know this post is "old" but it was helpful for me as a springboard to go into more finding on the subject. If it was helpful to me now, it definitely will be helpful to others "tomorrow". Below is an excerpt (copy-paste) from The Variant Call Format and VCFtools - Danecek et al (2011) :

GT, genotype, encodes alleles as numbers: 0 for the reference allele, 1 for the first allele listed in ALT column, 2 for the second allele listed in ALT and so on. The number of alleles suggests ploidy of the sample and the separator indicates whether the alleles are phased (”|”) or unphased (”/”) with respect to other data lines (Figure 1).

ADD REPLYlink written 7.7 years ago by napoleonbesong20
1

This was useful but still left the meaning of the order of the alleles ambiguous for me - i.e. which alleles are in the same chromosome/phase. A look at Fig 1 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3137218/figure/F1/) in the original paper you referenced confirms that for phased data alleles from different variants that are in the same position in the GT field are on the same chromosome (provided they are in the same phase set which is implied if no PS field is present).

ADD REPLYlink modified 7.5 years ago • written 7.5 years ago by sliders70

So when VCF has 0|1 or 1|0, then it is safe to assume that first column (before |) always represents one haplotype, and second column (after |) always represents another haplotype?

ADD REPLYlink written 9.5 years ago by Chronos600

more or less. you will be able to build a haplotype with the alleles on the first column, and another one with the alleles on the second column.

ADD REPLYlink written 9.5 years ago by Jorge Amigo12k

Thanks, this is what I wanted to be sure of.

ADD REPLYlink written 9.5 years ago by Chronos600
6
gravatar for sliders
7.5 years ago by
sliders70
sliders70 wrote:

Fig 1 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3137218/figure/F1/) in the original VCF/VCFtools paper referenced above confirms that for phased data alleles from different variants that are in the same position in the GT field are on the same chromosome (provided they are in the same phase set which is implied if no PS field is present).

ADD COMMENTlink modified 7.5 years ago • written 7.5 years ago by sliders70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1185 users visited in the last hour