Question: Phased And Unphased Genotypes In Vcf Files: Does The Order Of Alleles Matter?
26
gravatar for Chronos
9.5 years ago by
Chronos590
Germany
Chronos590 wrote:

As this page explains, phased genotypes are alleles-order-sensitive.

I assume that the order of alleles in VCF phased genotypes (like 0|1 and 1|0) is important as well, but I failed to find any confirmation of that in the format description.

Or is the order-sensitive alleles listing such a common thing that it doesn't need explicit description? (I'm new to the field.)

vcf genotyping • 19k views
ADD COMMENTlink modified 7.3 years ago by sliders60 • written 9.5 years ago by Chronos590
11
gravatar for Jorge Amigo
9.3 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

the phase status of an allele takes into account in which chromosome pair has been found. as far as I know, the main reason to use allele phasing information is to increase the correctness of the haplotypes and haplotype blocks inferred from them. it makes sense to name all allele pairs sorted in the same way once you know which allele pair is on which chromosome pair, because if you have all this information sorted you'll be able to easily build haplotypes by dealing sequentialy first with first allele bases only and then with second allele bases only.

trying to be a little more visual (and simplistic too, so please all basic geneticists accept my apologizes in advance), take the table from the webpage you've mentioned:

IND, id1, id2, id3, id4, id5
rs1, AT, TT, ??, AT, AA
rs2, GC, CC, GG, CC, CG
rs3, CC, ??, ??, CG, GG
rs4, AC, CC, AA, AC, AA

if you look to individual 1 (id1) you will have 2 different haplotypes: AGCA (from first chromosome pair) and TCCC (from second chromosome pair). this information wouldn't be known if genotypes were unphased, in which case other haplotyping algorithm should be applied.

ADD COMMENTlink modified 13 months ago by RamRS28k • written 9.3 years ago by Jorge Amigo11k
2

I know this post is "old" but it was helpful for me as a springboard to go into more finding on the subject. If it was helpful to me now, it definitely will be helpful to others "tomorrow". Below is an excerpt (copy-paste) from The Variant Call Format and VCFtools - Danecek et al (2011) :

GT, genotype, encodes alleles as numbers: 0 for the reference allele, 1 for the first allele listed in ALT column, 2 for the second allele listed in ALT and so on. The number of alleles suggests ploidy of the sample and the separator indicates whether the alleles are phased (”|”) or unphased (”/”) with respect to other data lines (Figure 1).

ADD REPLYlink written 7.4 years ago by napoleonbesong20
1

This was useful but still left the meaning of the order of the alleles ambiguous for me - i.e. which alleles are in the same chromosome/phase. A look at Fig 1 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3137218/figure/F1/) in the original paper you referenced confirms that for phased data alleles from different variants that are in the same position in the GT field are on the same chromosome (provided they are in the same phase set which is implied if no PS field is present).

ADD REPLYlink modified 7.3 years ago • written 7.3 years ago by sliders60

So when VCF has 0|1 or 1|0, then it is safe to assume that first column (before |) always represents one haplotype, and second column (after |) always represents another haplotype?

ADD REPLYlink written 9.3 years ago by Chronos590

more or less. you will be able to build a haplotype with the alleles on the first column, and another one with the alleles on the second column.

ADD REPLYlink written 9.3 years ago by Jorge Amigo11k

Thanks, this is what I wanted to be sure of.

ADD REPLYlink written 9.3 years ago by Chronos590
5
gravatar for sliders
7.3 years ago by
sliders60
sliders60 wrote:

Fig 1 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3137218/figure/F1/) in the original VCF/VCFtools paper referenced above confirms that for phased data alleles from different variants that are in the same position in the GT field are on the same chromosome (provided they are in the same phase set which is implied if no PS field is present).

ADD COMMENTlink modified 7.3 years ago • written 7.3 years ago by sliders60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 693 users visited in the last hour