Question: Why Are There Two Alleles For Chromosome X In Males (In 1000Genomes Vcf Files)?
8
gravatar for agnieszka
6.8 years ago by
agnieszka110
agnieszka110 wrote:

If you look at variants in X chromosome in current VCF files from 1000genomes: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp//release/20110521/ALL.chrX.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz, there are often two alleles for male individuals. For example, for HG00096, we can find:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00096
X 60034 . ACC A 256 PASS AVGPOST=0.9664;LDAF=0.0610;THETA=0.0087;ERATE=0.0027;RSQ=0.7797;AC=117;AN=2184;VT=INDEL;AF=0.05;ASN_AF=0.07;AMR_AF=0.05;AFR_AF=0.09;EUR_AF=0.02 GT:DS:GL 0|0:0.150:0,0,0

X 60052 rs186434315 T A 100 PASS AC=752;AN=2184;VT=SNP;AA=.;AVGPOST=0.9538;RSQ=0.9370;SNPSOURCE=LOWCOV;LDAF=0.3410;ERATE=0.0006;THETA=0.0058;AF=0.34;ASN_AF=0.21;AMR_AF=0.35;AFR_AF=0.33;EUR_AF=0.46 GT:DS:GL 0|1:1.000:-0.18,-0.47,-2.40
...
X 63621 rs189671919 G A 100 PASS AC=273;AN=2184;RSQ=0.5045;ERATE=0.0093;LDAF=0.1881;VT=SNP;AA=.;THETA=0.0076;SNPSOURCE=LOWCOV;AVGPOST=0.8028;AF=0.12;ASN_AF=0.10;AMR_AF=0.14;AFR_AF=0.09;EUR_AF=0.16 GT:DS:GL 1|1:1.650:-3.22,-0.47,-0.18
...
X 85928 rs145862927 A T 100 PASS AVGPOST=0.9752;LDAF=0.0251;AN=2184;VT=SNP;AA=.;AC=31;ERATE=0.0010;SNPSOURCE=LOWCOV;THETA=0.0113;RSQ=0.5623;AF=0.01;ASN_AF=0.01;AMR_AF=0.01;AFR_AF=0.0020;EUR_AF=0.03 GT:DS:GL 1|0:1.000:-2.28,-0.01,-1.55

So we have phased variants of type '0|0', '1|0', '0|1' and '1|1'. How is it possible if there is only one chromosome X in male individual? We are given a phased alleles, so where does the second (right site of '|') variant go, if it is present ('1')?

I found some notes about pseudo-autosomal regions, but I do not fully understand that. Does it mean that the second variant is on the male Y chromosome?

vcf 1000genomes variant • 4.0k views
ADD COMMENTlink modified 6.8 years ago by lh331k • written 6.8 years ago by agnieszka110
1

Google "Pseudoautosomal region".

ADD REPLYlink written 6.8 years ago by lh331k
1

I can understand that there is a crossing over between X and Y in the pseudoautosomal regions. What I don't get is why in the VCF files there are two alternative alleles for X chromosome of male individual, while there is only one X chromosome for male.

ADD REPLYlink written 6.8 years ago by agnieszka110

There are males with two X chromosomes and one Y.

ADD REPLYlink written 6.8 years ago by Asaf6.1k
2

The case (two alleles for X chromosome, for male individual) is valid for all males in VCF files from 1000genomes. I don't think that all of them have this disorder.

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by agnieszka110
5
gravatar for bdemarest
6.8 years ago by
bdemarest460
Salt Lake City, UT, USA
bdemarest460 wrote:

The loci in your example are within PAR1 (pseudoautosomal region 1). If a male is heterozygous then it must be the case that one allele is on X and the other allele is on Y. I suspect that if you check other regions of X in males you will find very little evidence of heterozygosity.

From http://genome.ucsc.edu/cgi-bin/hgGateway:

"The Y chromosome in this assembly contains two pseudoautosomal regions (PARs) that were taken from the corresponding regions in the X chromosome and are exact duplicates:

chrY:10001-2649520 and chrY:59034050-59363566 
chrX:60001-2699520 and chrX:154931044-155260560"
ADD COMMENTlink written 6.8 years ago by bdemarest460
5
gravatar for lh3
6.8 years ago by
lh331k
United States
lh331k wrote:

Let's consider another case: two homologous chromosome 1. With the standard technology, you only get heterozygotes but do not know for sure which homologous chromosome an allele belongs to. In PAR, X and Y behave nearly exactly the same as two homologous autosomes. One of the allele you see is from chrX and the other from chrY, but you do not know which allele is on X and which on Y.

Btw, PAR is the key reason why we should NOT use the UCSC genomes for mapping. UCSC puts identical PAR on both X and Y. When you do read mapping, reads will be randomly distributed between two identical copies with mapQ=0. The end result is you will get no variants from PAR with the current pipelines. Most of us do not care about PAR, but we should try to use the better strategy when possible.

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by lh331k

Thanks for this explanation. I am interested in PAR variants--which genome assembly should I be using? Does it solve the problem by omitting PARs from one chromosome?

ADD REPLYlink written 6.8 years ago by bdemarest460

ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/

There are two versions: humang1kv37.fasta.gz and phase2_reference_assembly_sequence. The latter contains extra pieces missing from GRCh37.

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by lh331k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1627 users visited in the last hour