Question: 1000 Genomes: Phased Or Not?
2
gravatar for Chronos
9.7 years ago by
Chronos600
Germany
Chronos600 wrote:

Q1. Running

zcat ALL.chrX.BI_Beagle.20100804.genotypes.vcf.gz | grep -v ^## | cut -f 345 | cut -d ':' -f 1 | grep -v '\./\.' | grep -v '|' | head

yields

NA18981
0/0
0/0
1/1
0/0
0/0
0/0
0/0
0/0
0/0

while running

zcat ALL.chrX.BI_Beagle.20100804.genotypes.vcf.gz | grep -v ^## | cut -f 345 | cut -d ':' -f 1 | grep -v '\./\.' | head

yields

NA18981
0|0
0|0
0|0
0|0
0|0
0|0
0|0
1|0
0|0

Is NA18981 phased, or is it not? Is it partially phased? If yes - then what rule/convention explains this partiality? (I know that microsatellite calls are unphased in phased genomes, but I believe I haven't seen any in this file.)

Q2. For somatic chromosomes (I've only checked this on chrs 1 and 2, but I assume this pattern is characteristic for all autosomal chromosomes) all 629 samples appear to be phased - that is, their genotypes at all positions are either unknown ./. or phased (e.g. 0|1). So are all 629 samples really phased on all somatic chromosomes?

Somewhat related: http://biostar.stackexchange.com/questions/5315/phased-and-unphased-genotypes-in-vcf-files-does-the-order-of-alleles-matter

genome • 2.8k views
ADD COMMENTlink modified 9.6 years ago by lh332k • written 9.7 years ago by Chronos600

You should check the documentation of the programs used to phase the data. They certainly contain a section about phasing chr X and haploids. Chr X is hemizygous in certain positions. Phasing can be problematic.

ADD REPLYlink written 9.7 years ago by Jarretinha3.3k

They used Beagle (judging from the chrX filename), and I'll have to read its manual sooner or later. Jarretinha, did you mean to say that 1000 genomes project considers pseudo-autosomal segments of Y as diploid segments on the corresponding chrX coordinates? That would make sense (as there is no Y chromosome anywhere in the data), but then... Why there are no non-diploid genotypes on chrX? (They should have no slash/pipe in them, just a number or a dot.)

ADD REPLYlink written 9.6 years ago by Chronos600

Please be more descriptive!

ADD REPLYlink written 9.6 years ago by Thaman3.3k

+1 I actually think the question is very clear.

ADD REPLYlink written 9.6 years ago by lh332k

ok, the question is very clear now after the edit. I've removed the -1.

ADD REPLYlink written 8.5 years ago by Giovanni M Dall'Olio27k
3
gravatar for lh3
9.6 years ago by
lh332k
United States
lh332k wrote:

You should not trust the X chromosome genotype calls. They were not made in the proper way. On autosomes, the majority of SNPs are arbitrarily phased, but some are unphased due to call set merging. The upcoming release will be much cleaner.

As to phasing itself, you can always arbitrarily phase heterozygotes. The question is how many switch errors we make. The answer to this question is largely unknown on a data set like 1000g.

ADD COMMENTlink written 9.6 years ago by lh332k
2

The more consistent data set is now available ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20101123/interim_phase1_release/

The genotypes here are complete and phased for all 1094 individuals

ADD REPLYlink written 9.3 years ago by Laura1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1027 users visited in the last hour