Question

mouse chrY centromere location

0

Entering edit mode

3.9 years ago

igor 13k

Mouse centromeres have been previously discussed here (for example: Ucsc Mm10 Mouse Gap Table Has The Same Centromere Coordinates ). The centromeres are at the ends of the chromosome and the UCSC gap table lists them at 110000-3000000 for every chromosome. However, there is an exception. That same table does not have a centromere for chrY (it has the short arm as 100000-110000, same as the other chromosomes).

Apparently, chrY is an exception. From Soh et al:

We obtained the complete sequence of the mouse Y centromere. Consisting of 90 kb of satellite repeats ... It is located between 3.5 Mb of short-arm and 86.0 Mb of long-arm sequence, confirming that the mouse Y is the only acrocentric chromosome among all the other telocentric mouse chromosomes.

David Adler (University of Washington) generated some idiograms (in 1994?) where you can see chrY is a bit unusual in this regard:

idiograms

For automatically retrieving chrY centromere, is there some reference file with the proper coordinates?

genome reference • 2.2k views

ADD COMMENT • link 3.8 years ago by igor 13k

score 2 · Accepted Answer · 2020-07-03

I contacted UCSC and was forwarded to the NCBI Genome Reference Consortium, since the gap files are based on AGP files from GRC. This is the official response in case anyone is curious:

Dr. Page and his lab have long served as collaborators of the GRC. As a consequence, we were made aware of the location of the mouse centromere way back in 2009, and did include it in GRCm38. As described in the Soh et al publication, the centromere is a ~90 kb region found in the BAC clone AC175459.4 (bp 57238-147035). That clone is included in the tiling path for the GRCm38 assembly, and the corresponding position of the centromere in the chromosome is CM001014.2/NC_000087.7: 4072168-4161965.

If you examine the assembly AGPs (the files which describe the FASTA; see https://www.ncbi.nlm.nih.gov/assembly/agp/AGP_Specification/; https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/635/GCA_000001635.8_GRCm38.p6/GCA_000001635.8_GRCm38.p6_assembly_structure/Primary_Assembly/assembled_chromosomes/AGP/), you'll see that the 3 Mb gaps at the start of each chromosome have distinct specifications. On all chromosomes except Y, the first 100 kb represent the telomere, the next 10 kb represent the short arm, and the next 2.89 Mb, the centromere. On chr. Y, the gap is only 101000 bp, and represents only the telomere and short arm.

The chromosome Y centromere is not listed in the AGP files because this centromere is not a gap- it is sequenced. The AGP format only has mark-up available for biological gaps (telomere, centromere, short arm, heterochromatin), and does not support a similar mark-up if those regions have been sequenced. For GRCm39, we will be using a separate file to explicitly define the chr Y centromere.

The first sequenced base (non-N) of chr Y does not include telomeric repeats, so we know that we’re still missing some sequence at this end. Thus, we still include a gap to account for this “missing” sequence, and we have used the default short-arm length for this for consistency with the other chromosomes.

Although it does not fully answer my question, it substantially clarifies the issue.