4.5 years ago by
I'm not sure what you really mean by "homolgy at the coded amino acid level (exome)" and how this fits with your first point of "whole genome, not sequence by sequence". Did you mean the exome homology, transcriptome homology, CDS homology or protein homology? By default, these will all be "sequence by sequence".
On the first point, what is the genomic homology, it really depends on how you are calculating it. You have to remember that the genome isn't a single contiguous piece of DNA. It is divided up into multiple chromosomes and mice and humans have different numbers of chromosomes. Mice have two fewer chromosomes than humans, which translates to a ~15% smaller genome compared to humans.
You also have to remember that the chromosomes between species are structured differently, so even though a gene might be perfectly conserved between both species, it may be located on different chromosomes between species. You have to also account for differences in the areas immediately surrounding that gene, the promoters and other features, as well as neighboring genes.
You also have instances where genes have been duplicated in one species but not the other, and whole families of genes can be expanded in one species compared to another.
This is probably why you can't reasonably sum things up into a single 'simple' value.
This breaks out some values:
It looks like 40% of the human genome will align at the nucleotide level, although I'm not sure this translates directly into 40% homology.
If you look at syntenic regions, about 90% of the mouse genome resides in syntenic blocks conserved between the human and mouse, but I'm not sure what the nucleotide homology between these regions are.
4.5 years ago by
pld ♦ 4.8k