Question: What Are Phased And Unphased Genotypes?
23
gravatar for Nick
6.6 years ago by
Nick230
Nick230 wrote:

As the title implies, "what are phased and unphased genotypes?" I am playing with 1000 genomes data and am not sure if I should be handling phased/unphased genotypes differently.

documentation on the internet seems to be quite sparse...

genome genotyping • 37k views
ADD COMMENTlink modified 19 days ago by Jerry Zhu0 • written 6.6 years ago by Nick230
27
gravatar for Larry_Parnell
6.6 years ago by
Larry_Parnell16k
Boston, MA USA
Larry_Parnell16k wrote:

Phased data are ordered along one chromosome and so from these data you know the haplotype. Unphased data are simply the genotypes without regard to which one of the pair of chromosomes holds that allele.

ADD COMMENTlink written 6.6 years ago by Larry_Parnell16k
10

A biallelic genotype comes from two chromosomes. Phased means I know not only the genotypes but which chromosome each genotype call came from. This lets you interpret which sets of genotypes are being inherited together; google haplotype if this isn't clear.

ADD REPLYlink written 6.6 years ago by David Quigley10k
3

No. A lot of depth is not needed to call major and minor allele. First, there is no such thing as major/minor for an individual; those are population values. Allele calls for an individual's sample are based on sequence quality - so two reads can do it, one with an A and one with a G. If high quality, the subject is a heterozygote. SNPs from the 1000G data are in dbSNP 132, I believe.

I don't quite understand your second question. Either rephrase that or give me some time to think about this.

ADD REPLYlink written 6.6 years ago by Larry_Parnell16k

The genotype at hand would need to have a lot of depth and allele counts to be able to determine the major and minor alleles then right? Plus, even though the phased data is "ordered," the order of the bases don't really matter right (Aa is the same as aA)?

ADD REPLYlink written 6.6 years ago by Nick230

Sorry I wasn't clear with the 2nd question. Say I have a genotype called as 1|0. This is the same as 0|1 right? Also, from the example you provided above, supposing one of the reads was horrible (we aren't sure if the called G is really a G), then instead of having a "phased" AG genotype we would have an "unphased" AG genotype?

ADD REPLYlink written 6.6 years ago by Nick230
14
gravatar for Genotepes
6.6 years ago by
Genotepes890
Nantes (France)
Genotepes890 wrote:

Hi

actually (I think) phased or unphased status is not related to any measure of quality. For each individual, there are two chromosomes labelled (arbitrarily when you do not have genotypes of the parents) paternal and maternal. The names are self-explanatory.

For a haterozyguous genotype at a SNP position (which is called conditional on some quality score), you may know which allele is on the maternal chromosome and which one is on the paternal chromosome. The genotyped is "ordered". If you are able to assign, for a heterozyguous call (still conditional on the quality) at another SNP position which allele is on the paternal chromosome and which one is on the maternal, then you are able to phase these two SNPs - or more precisely, to phase the alleles at this SNPs. You then get an haplotype - or a suite of "ordered" SNPs.

In this context, having ordered 0/1 at SNP1 and 1/0 at SNP 2 is not the same as having 0/1 at SNP 1 and 1/0 at SNP 2.

First gives : 0 1 while second gives 0 0 _____ _____

           1   0                          1   1

Now, one could use some pre-estimated phase information on a panel population - typically different from the population where you call your alleles - to help calling an allele when the quality is low. This is what BEAGLECALL do, usually in a chip genotyping context.

As for the 1000 G data, having the phased data helps getting a better estimate of linkage disequilibrium. This also means that the format may differ so you need to take care when you take this as an input. But besides input format and more info about LD, the way you may use phased and unphased here are not really different.

Christian

PS : sorry if I went too far to the basics

ADD COMMENTlink modified 5.7 years ago • written 6.6 years ago by Genotepes890

I realise the format is not what I expected

Trying to re-display this.

Genotype 1 would be 1 0 / 0 1 while genotype 2 would be 0 0 / 1 1

ADD REPLYlink written 6.6 years ago by Genotepes890

hi, i've read about the concept of phased haplotypes and ordered genotypes but never worked with any data. When the OP says they have a genotype called as 0|1 what are the numbers? Is it paternal allele/ maternal allele or is the paternal allele always 0 and the maternal allele always 1

ADD REPLYlink written 6.6 years ago by Pi510

hi, i've read about the concept of phased haplotypes and ordered genotypes but never worked with any data. When the OP says they have a genotype called as 0|1 what are the numbers? Is it paternal allele/ maternal allele so paternal allele = 0 and maternal allele = 1 for this SNP

ADD REPLYlink written 6.6 years ago by Pi510

My experience is with phased data from 1000 genomes for imputation programs (so not vcf files). There, you have one line per chromosome (in a .haplo type file) - I think paternal is the first one. There,the 0 and 1 refers to a code from a descriptive marker file. Let's say rs1 has alleles A and G and rs2 is C T Then ind1 0 1 ind1 0 0 means thath ind 1 bears haplotypes A - T A - C If the convention is paternal/maternal, then 0/0 - 0/1 Could you tell us which file you are using ? What I was referring to is 1000G processed file intended for softs like IMPUTE or MACH

ADD REPLYlink written 6.6 years ago by Genotepes890

Excuse me:

  In the phased genotype, the paternal is the first. Is there any materials that could prove it?

Many thanks.

ADD REPLYlink written 2.9 years ago by 89759864480

Hi. I have never used any of this data. I was just reading the question out of general interest and wasn't familiar with the notation :)

ADD REPLYlink written 6.6 years ago by Pi510

if the paternal haplotype is A-T and the maternal haplotype is A-C why isn't the second notation 0/0 - 1/0 or using the bases A/A - T/C

ADD REPLYlink written 6.6 years ago by Pi510

Let me check .... You're right, -1 for me. Was after a long night. Apologies Let me rephrase (although I am sure you inderstand) Ind 1 id 0 1 at rs1 and 0 0 at rs2, he will have the haplotypes (00) / (10). Or (AT)/(GT).

I really screwed up the example but not easy to do hapltoype things ...

Apologies

ADD REPLYlink written 6.5 years ago by Genotepes890
4
gravatar for 2184687-1231-83-
6.6 years ago by
2184687-1231-83-4.8k wrote:

If you are analysing the 1000G data taking each SNP as an independent data point, you most probably don't need phased data. If what you are studying are correlations between, say, pairs of SNPs, and can be influenced by recombination, like linkage disequilibrium or selective sweeps, then you need phased data.

ADD COMMENTlink written 6.6 years ago by 2184687-1231-83-4.8k
0
gravatar for Jerry Zhu
19 days ago by
Jerry Zhu0
Jerry Zhu0 wrote:

https://www.illumina.com/techniques/sequencing/dna-sequencing/whole-genome-sequencing/phased-sequencing.html

well explained here

ADD COMMENTlink written 19 days ago by Jerry Zhu0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1357 users visited in the last hour