Question: No Reads Ever Map To First 3Million Bases Of Chromosomes In Mouse Genome! Why?
2
gravatar for Sukhdeep Singh
6.8 years ago by
Sukhdeep Singh9.9k
Netherlands
Sukhdeep Singh9.9k wrote:

I have observed it lots of times, and can confirm now that the chip fragments never map to the first 3M bases of the chromosome start and also sometimes to last few hunderd thousand bases at the chromosome end.

Is it because of the centromere and telomere?? and these regions are not transcribed or are repeats.

Cheers

chipseq ngs mapping rna-seq • 5.1k views
ADD COMMENTlink modified 6.4 years ago by Biostar ♦♦ 20 • written 6.8 years ago by Sukhdeep Singh9.9k
4

Heterochromatin, I would say.

ADD REPLYlink written 6.8 years ago by fo3c430

Yeah great, but one also loosely refers to the "non/poorly-expressed" DNA as heterochromatin, which also occurs within the chromosome. Is it true, as the protein might be binding and regulating this poorly expressed locus, so it should show binding.

ADD REPLYlink written 6.8 years ago by Sukhdeep Singh9.9k
2

These kind of regions are represented as "NNNNNNNNNNNNNNNNNN" in reference fasta file.

ADD REPLYlink written 6.8 years ago by Ashutosh Pandey11k

But N would mean that there is no DNA present or it couldn't be sequenced. Do you know, if they deliberately added the N's, so nothing could get mapped(like a notation).

ADD REPLYlink written 6.8 years ago by Sukhdeep Singh9.9k
5

Sukhdeep, there is DNA present but the core of the centromere is composed of arrays of simple repeats which are difficult to sequence and virtually impossible to assemble. The length of these regions has been determined by cytogenetics because the core repeats are known for many species. The pericentromere is composed of more complex repeats, like nested retrotransposons (along with some coding genes as well), but this is still an incredibly problematic area of the genome to reconstruct given the similarity of the repeat regions. This is especially true in mouse since retrotransposons have been much more active in this region of the genome than in humans (though they both pale in comparison to the situation in plants). So, what you typically have for most species are assemblies where these regions are not represented at all.

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by SES8.2k

Thanks for the information :)

ADD REPLYlink written 6.8 years ago by Sukhdeep Singh9.9k
2

"Couldn't be sequenced" or couldn't be mapped/assembled?

ADD REPLYlink written 6.8 years ago by PoGibas4.8k

Yeah in context with genome mappings, it should be mapped/assembled, I was referring to naive base calling via sequencing.

ADD REPLYlink written 6.8 years ago by Sukhdeep Singh9.9k
9
gravatar for deanna.church
6.8 years ago by
deanna.church1.1k
Bethesda, MD
deanna.church1.1k wrote:

The Genome Reference Consortium (http://genomereference.org) attempts to model biological gaps in the assemblies that we produce. Unfortunately, in the current assemblies, the models for both centromeres and telomeres are rather poor so they just consist of a run of Ns. We don't have good estimates of mouse telomere/centromere size, so we use a default of 3M Ns for these regions. This information is marked up in the AGP files that define the assembly: mouse: ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Mus_musculus/GRCm38.p1/ human: ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37.p11/

Note: even within the euchromatic regions there can be long runs of Ns representing gaps that we can't fill yet. In many cases we do have a good size estimate for the gap- typically based on experimental evidence like comparison to an optical map. For human, the problem is that some of the euchromatic gaps are polymorphic, so the size of the gap really depends on the individual you are assessing.

hope that helps.

ADD COMMENTlink written 6.8 years ago by deanna.church1.1k
6
gravatar for Sukhdeep Singh
6.8 years ago by
Sukhdeep Singh9.9k
Netherlands
Sukhdeep Singh9.9k wrote:

I got my answer. Below is the graphic ideograms of Mouse karyotypes from Ensembl. So, the start of each chromosome in UCSC is the centromere, which can span to first ~3M bases. There are no genes in the region, the second screenshot of Chr2 in Mouse. I've checked a couple of others as well.

So, if anything binding there, might be noise. Centromeres and Telomeres are contituting a lot of repetitive regions as well, which I generally remove, thus no mapping observed. Can someone comment on how can we pull this information from the databases (UCSC), how much region in spanned to Centrosome/Telomere and contains no genes, one useful case would be on how to modify the chromosome co-cordinate file, so as to replace the start 0 with position where centromere ends. This file has a usecae with the BEDOPS-based binning script to calculate the coverage, thus will save little time and resources.

enter image description here

enter image description here

ADD COMMENTlink modified 6.8 years ago by Casey Bergman18k • written 6.8 years ago by Sukhdeep Singh9.9k

I guess this is applicable not only to the mouse genome. Would like to hear comments on the other genomes as well.

ADD REPLYlink written 6.8 years ago by PoGibas4.8k

As SES said in the previous comment, they are available in all genomes, with different variability among plants and animals, so I presume, it would be the same scenario, though the length and position might differ.

ADD REPLYlink written 6.8 years ago by Sukhdeep Singh9.9k
3
gravatar for Jeremy Leipzig
6.8 years ago by
Philadelphia, PA
Jeremy Leipzig18k wrote:

Mice have telocentric chromosomes. I'm not sure why.

ADD COMMENTlink written 6.8 years ago by Jeremy Leipzig18k

I think the Y is acrocentric but the X and autosomes are telocentric. As to why, I don't think there is any explanation for why all the chromosomes show the same pattern despite the fact that people have been studying this for about one hundred years (though I'd like to find out I was wrong on that).

ADD REPLYlink written 6.8 years ago by SES8.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1178 users visited in the last hour