How come on T2T human genome assembly the PAR regions of chrY are bigger than the corresponding regions on chrX?
1
0
Entering edit mode
11 months ago
Duarte Molha ▴ 240

As you can see on the PAR file listed here:

https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_PAR.bed

If you download the file you see the coordinates for PAR1 listed are:

PAR1
chrX    0   2394410
chrY    0   2458320

and for PAR2:

PAR2
chrX    153925834   154259566
chrY    62122809    62460029

In terms of sizes:

PAR1 size in chrX =>     2394410 - 0 = 2,394,410
PAR1 size in chrY =>     2458320 - 0 = 2,458,320
PAR2 size in chrX => 154259566 - 153925834 = 333,732
PAR2 size in chrY =>   62460029 - 62122809 = 337,220

It was my understanding that what defines a PAR region is a region that is shared between CHRX and Y chromossomes

So how is this possible?

Thank you for your help

genome assembly T2T • 1.3k views
ADD COMMENT
2
Entering edit mode
11 months ago
LauferVA 4.2k

Duarte -

I think the question you are asking is, "how do they hybridize?" correct? For that, think about how HR and NHAR mechanisms operate on sex chromosomes specifically.

If you mean operationally, the CHM cell line was derived from a complete hydatidaform mole that resulted from the fertilization of an empty ovum with a sperm, followed by the duplication of the sperm's genome (which contained 1 X chromosome, therefore resulting in a XX diploid genome).

Whether CHM13 was passaged few times or many, it is not unreasonable to expect that the PAR of an X and Y chromosome could change by a few thousand bp between the two - genomes undergo SV every generation. In fact, part of the reason for the very high amount of repeat sequence in ChrX and ChrY likely has to do with genomic fluidity born from both nonPAR and PAR regions. Once the pangenome is publically available, we'll be able to lay many PAR ChrX and many PAR ChrYs down next to each other, and at that time we are likely to see (Id imagine) a skewed normal distribution with alpha > 0.

A final point is that some genomic structural variants are selected for BECAUSE they have the propensity to produce duplications and deletions, facilitating, for instance, rapid change in gene family copy number in a relatively short evolutionary time frame.

VAL

ADD COMMENT
0
Entering edit mode

Thank you, Vincent.

I believe my question stems from a possible misunderstanding of what occurs during meiosis.

I previously thought that during prophase I of meiosis I, homologous chromosomes (including PAR regions of X and Y chromosomes) undergo crossover of homologous regions. Consequently, there should be no significant difference in the reference between males and females within the PAR regions of chrX and chrY, as these regions are continually exchanged in a similar manner to autosomal chromosome pairs.

While individuals may exhibit unique variations, duplications, deletions, and expansions, the PAR regions should be identical in terms of reference, assuming we consider the T2T a reference assembly.

From a practical standpoint, I want to analyze regions with the number of perfect match positions on the genome, such as when designing aCGH probes and NGS baits. It is crucial to understand how many regions these are homologous to. For GRCh38 (hg38) and GRCH37 (hg19), accounting for PAR matches was relatively simple, as the usual references have the PAR regions of chrY hardmasked with N, so all baits that match the PAR regions only return hits on chrX.

If desired, we can calculate the same position on chrY by simply adding an adjustment value. For example, position A in PAR2 of chrX corresponds to position B in PAR2 of chrY by a constant offset value N. This N value can be determined based on the length of the two chromosomes and the start and end positions of the PAR regions in these chromosomes.

I do not regard an NGS bait with only two perfect maps, one on PAR1 of X and the other on the corresponding PAR1 of Y, as two distinct homology areas. To me, they are the same as those on autosomal chromosome pairs.

However, this simple adjustment is not possible with T2T since the PAR regions are not of identical size.

Moreover, considering automatic genome annotation pipelines like Ensembl and GENCODE, these pipelines treat the PAR regions as a single entity for gene annotation purposes. This new information could disrupt that assumption.

ADD REPLY
1
Entering edit mode

Duarte, thank you for your thoughtful reply. I think the statement below could be contributing to the confusion.

While individuals may exhibit unique variations, duplications, deletions, and expansions, the PAR regions should be identical in terms of reference, assuming we consider the T2T a reference assembly.

The T2T consortium, upon its completion, merged with the Human Pangenome Reference Consortium. While the stated purpose of T2T was to produce a gapless assembly of a haploid human genome, this was never more than a stepping stone towards the larger goal of routine assembly of diploid genomes into gapless, phase whole human haplotypes.

Although it is at first difficult to see, in the final analysis, that goal transcends the idea of a linear reference genome and obviates it. The pangenome reference consortium is not aiming to provide a reference genome, it is aiming to provide a pangenome reference.

From that perspective, minor differences in length (few kb here, few 10s of kb there) all over the genome (PAR or otherwise) will be found, including many not priorly known.

The upshot of this is that there will be no one genome that contains all SV we need, and 2) no matter when we make a new reference, from here on out we will still be continually discovering new SVs that were not known at the time.

The solution presented by the idea of a human pangenome reference is that we don't do that. It may be that the way in which we name genomic variation will change (I also think the .vcf file will increasingly lose utility for related reasons).

In a context like this, the question becomes, how do we meaningfully denote what a genome is? How do we describe de novo SVs? Do we retrofit them on to one genome that we agree upon as the reference from now on, continually updating every SV not found in the linear reference as a "deletion" in the reference?

Your question has a direct resonance / corollary with respect to method of genome assembly. Alignment-based algorithms may experience problems, it is true. In addition, as you mention, its my belief that variant annotation software will all have to be re-made in time. Why? Well, once we can routinely sequence and phase segmental duplications, we will want to be able to accurately call variants within those SVs, which may be present or absent, in short, a "structural variation-aware" genomic annotation algorithm.

From a practical standpoint, I want to analyze regions with the number of perfect match positions on the genome, such as when designing aCGH probes and NGS baits. It is crucial to understand how many regions these are homologous to. For GRCh38 (hg38) and GRCH37 (hg19), accounting for PAR matches was relatively simple, as the usual references have the PAR regions of chrY hardmasked with N, so all baits that match the PAR regions only return hits on chrX.

My expertise ends here - you will know far more about these processes (aCGH probes; NGS baits) than I do. But, I can tell you what I would do. The HPRC is an open consortium.

Not only can you freely join it, I believe that you can gain access to the data that have been generated to date fairly easily.

Id get as many phased human ChrX and ChrY haplotypes as I can, and then I'd write tools to determine empirically what sequences will work the most robustly across human haplotypes.

ADD REPLY
1
Entering edit mode

thank you vincent. Been a pleasure discussing this with you.

ADD REPLY
0
Entering edit mode

good luck! if youd like to, please let me know what you end up doing.

with kindness, vincent

ADD REPLY

Login before adding your answer.

Traffic: 2263 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6