Levels Of Gene Homologous-Ness?
6
6
Entering edit mode
12.3 years ago
Mike Dewar ★ 1.6k

Showing my shiny list of differentially expressed mouse genes to a senior biologist in my group elicited something along the lines of the following reponse:

That's a shiny list, my lad, but lots of those mouse genes encode surface receptors which are poorly conserved in humans. Best you get back to your laptop and find a better list!

So I know how to find a 'homologous' gene, but I don't know how to find how homologous a particular gene is. Does anyone know any way to rate a gene in terms of how well its function is conserved across two species, so that I can build this in as weights in my classifier?

Answers along the lines of "the question you're asking makes no sense" are acceptable (assuming your heart's in the right place) as this is really testing the limits of my already very limited understanding of this data and the biology driving it. Also questioning the phrase "how homologous" is probably valid.

gene homology orthologues • 7.9k views
ADD COMMENT
4
Entering edit mode

I like how your senior biologist talks. Is he a leprechaun?

ADD REPLY
0
Entering edit mode

As pointed out in all the answers there are ways of finding out if genes are homologous or not. I would just like to insist on Stefano's answer: homology is one thing, but it is only an indication of possible related function that does not always hold true. Also, as he says, it is not a trivial task to find that information out. In any case it's a great question! Good luck ;)

ADD REPLY
0
Entering edit mode

It is why I emphazed in my response (see below) the availablity at the Jackson Lab of a orthology table with column information such as Evidences used to support the mouse-human orthology, J Numbers for references supporting the orthology and PubMed IDs for references supporting the orthology.

ADD REPLY
0
Entering edit mode

@Nicojo : It is why I emphazed in my response (see below) the availablity at the Jackson Lab of a orthology table with column information such as Evidences used to support the mouse-human orthology, J Numbers for references supporting the orthology and PubMed IDs for references supporting the orthology.

ADD REPLY
0
Entering edit mode

@Fred: very good reference indeed. However, orthology does not mean same function. I'll grant that the "evidences" that can be found in that resource will be helpful for determining that. But I'm not sure how your answer "emphasizes" Stefano's. And as I said, same function doesn't necessarily mean same mechanism either. Stefano gives an excellent example of homology with wildly different function with the wing/arm comparison. That is what I think is important and well said in Stefano's answer: I do not see that explained or emphasized in any other answer. But I may have missed it...

ADD REPLY
13
Entering edit mode
12.3 years ago

You can't quantify homology. It is either a homolog or not a homolog. Homology is conceptual framework to define the evolutionary relationship between two genes or proteins. But you can quantify similarity between 2 genes using BLAST searches. If two genes are coming from two different species you could look the orthology (see Homology at Wikipedia, A nice graphical overview of homologs, orthologs and paralogs at NCBI).

In your case you have a choice of either getting orthologs from databases (for example COG, Eggnog or Inparanoid) or get the orthologs via Bi-directional best hit search using BLAST. A short description for Bi-directional Best Hit is

BLAST human gene X against mouse sequence database

Select the best hit Y

Take the best hit Y from the mouse

Blast Y against human sequence database

If it Y picks up sequence X as a best hit, X is a bi-directional best hit of Y

Here you could define the best hit using %ID, E-Value, Coverage or a combination of 2 or more. Along with these, If you have protein domain information, you could consider domain architecture of the query and hit as a potential feature to define best hit.

Trust this helps you to get started.

ADD COMMENT
9
Entering edit mode
12.3 years ago
Ruchira ▴ 230

As others have pointed out, two genes are homologs if they evolved from a common ancestor. Two genes are orthologs of each other if, in the gene family tree, the last common ancestor of those two genes was a speciation event. Two genes are paralogs of each other if, in the gene family tree, the last common ancestor of those two genes was a duplication event. This is important to know because after duplication, the genes are more likely to separately evolve more specific functions (subfunctionalization) or even new functions (neofunctionalization).

Berkeley PHOG provides a precomputed database of orthologs derived from PhyloFacts gene family trees using tree distances. The default, most stringent variant, PHOG-S, aims to predict only clusters of genes which are all superorthologs of each other, that is, there are no duplication events in the portion of the gene family tree containing them. This variant has high precision but relatively lower sensitivity. The thresholded variant, PHOG-T, ignores putative duplication events that are very close to the leaves of the gene family tree (i.e., very recent in evolutionary history). How close is "very close" is a tunable threshold which you can set based on the taxonomic distance you're interested in. E.g., the "Close" threshold (available as a preset) was tuned for human-mouse orthologs. You can change the threshold on the fly and get more and more predicted orthologs of your query sequence. The ones you got first (at a lower threshold) are closer (by tree distance, and with less duplication events in between) than the ones you got later (after increasing the threshold). This suggests they may be closer in function as well, providing an answer to your original question.

For example, this search for superorthologs of ALG2_MOUSE provides a single human superortholog, ALG2_HUMAN. But searching at the close threshold brings up another closely related human sequence, Q8NBW5 (a clone sequence that might actually have differed from ALG2_HUMAN only due to sequencing errors). The sequence ALG2_HUMAN found at the superortholog threshold (i.e., threshold 0) is closer to ALG2_MOUSE than the sequence Q8NBW5 found at the higher, "close" threshold.

The Berkeley PHOG ortholog report is also available as CSV, e.g., orthologs for ALG2_MOUSE in csv format. You can see how to construct the url if you would like to do so programmatically. We have a student working on providing them in OrthoXML format as well. Hope this helps. Please let me know if you have further questions!

ADD COMMENT
0
Entering edit mode

Thanks so much for this explanation! I think that, as I'm interested in distances between a large number of genes and their orthologs, maybe throwing thousands of requests at the Berkeley PHOG may not be the right things to do...?

ADD REPLY
0
Entering edit mode

At the moment, no, thanks for asking. :-) However, if they're all in mouse, I can put up the current set of PHOG predictions for mouse on the downloads page for you. Right now human, E. coli, and S. cerevisiae are there.

These predictions are based on the gene family trees in PhyloFacts 2.0. We're currently working on PhyloFacts 3.0 (currently in an alpha stage), which among other things will have greatly expanded coverage. If you find that some of your genes of interest have no predicted orthologs, please let me know and I can add in the PhyloFacts 3.0 predictions for you.

ADD REPLY
7
Entering edit mode
12.3 years ago

Khader did a nice job describing a best hit algorithm and homology in general. You can save yourself some computational work by using pre-computed homologs from someplace like Ensembl. Here's an example of the type of ortholog and paralog data they have.

This is not only useful for being lazy and saving some work, but the computational approach extends beyond reciprocal best hits.

There is a Perl API if you need to automate.

ADD COMMENT
0
Entering edit mode

Nice suggestion Brad. Compara looks really interesting.

ADD REPLY
0
Entering edit mode

Thanks so much for this! Ensembl:Compara + BioMart + biomaRt has nailed this.

ADD REPLY
3
Entering edit mode
12.3 years ago

I am afraid I can't really help you, but I can point out a couple of things:

Genes either are or are not homologous. They can be very or little similar, but a gene is or is not homologous of another one.

What you can do is make sure that HumGeneA is actually homologous of MouseGeneA building a phylogenetic tree with all possible members of that family from both organisms.

Unfortunately, there is little you can say about the function. A wing, a penguin flipper and our arm are homologous, but the function is very different! Similarly two homologous genes can have very different functions...

To increase knowledge about function you could try and see if they are part of conserved pathways, or if they interact with the same proteins... but this will require a lot of work that can hardly be automated, I guess...

ADD COMMENT
2
Entering edit mode
12.3 years ago
Paulo Nuin ★ 3.7k

A good approach would be to start getting the homologous genes from Homologene, downloading the sequences and aligning both genes with something very stringent like EMBOSS' water (Smith-Waterman alignment approach). water will give you the percent identity and similarity among the genes, from what you can attest if they are "really" homologous or not.

ADD COMMENT
2
Entering edit mode
12.3 years ago

[?]

[?]

http://www.informatics.jax.org/

[?]There you will find and download section where there are several tab delimited table dedicated to Human and Mouse Orthology.[?]

ftp://ftp.informatics.jax.org/pub/reports/index.html#orthology

[?]I am sure one of these tables will feet you needs. For instance there is a table called Human and Mouse Orthology with Sequence information with columns such as :[?]

  • Evidences used to support the mouse-human orthology (comma-delimited)
  • J Numbers for references supporting the orthology (comma-delimited)
  • PubMed IDs for references supporting the orthology (comma-delimited)
ADD COMMENT

Login before adding your answer.

Traffic: 734 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6