Question

How to find all the GO terms related to morphology/developpment?

1

Entering edit mode

9.1 years ago

cyril-cros ▴ 950

Hi,

I would like to compare a few very remote vertebrate species (with divergence time of 100-500 million years). To do this, my PI and I decided to find all orthologous genes related to anatomical change/morphology/development and align them. I would like to get a list of relevant GO terms, and use it as a filter to select a subset of genes in which to look for orthologs.

I know I can instead try to find all my orthologous genes and then select those that are relevant, but I will lose quite a few candidate genes which could be shared by only a subset of the species I consider. It seems to me that most evolutionary genetics papers proceed this way, by finding unique orthologous genes and aligning them with no regard for GO terms.

The thing is, GO terms concern the cell level, with no regard for evolution or anatomy. What approach should I take?

gene ontology morphology evolution • 2.5k views

ADD COMMENT • link updated 9.1 years ago by SES 8.6k • written 9.1 years ago by cyril-cros ▴ 950

score 1 · Answer 1 · 2015-10-16

1

Entering edit mode

9.1 years ago

ebrown1955 ▴ 320

Usually there is a root term that has to do with development/morphology. Using the OBO file to get all the children of this root term, you will have all the terms you need. You can use an OBO parser like the Orange-Bioinformatics in python. I'm currently developing one, but it's not ready yet :(

ADD COMMENT • link 9.1 years ago by ebrown1955 ▴ 320

0

Entering edit mode

Be sure to make a post here on biostars when it is done! Thanks.

ADD REPLY • link 9.1 years ago by Endre Bakken Stovner ▴ 970

Ram · Answer 2 · 2015-10-16

1

Entering edit mode

9.1 years ago

Endre Bakken Stovner ▴ 970

First $ pip install godb

Then $ ipython

> import godb
> import re
> df = godb.get_annotations()

> df.loc[df.Definition.str.contains("development", flags=re.IGNORECASE)].head(2)

          GO id Ontology                Term                 Synonym  \
532  GO:0000902       BP  cell morphogenesis  cellular morphogenesis
935  GO:0001555       BP       oocyte growth

                                            Definition
532  The developmental process in which the size or...
935  The developmental growth process in which an o...

See https://github.com/endrebak/godb for more.

ADD COMMENT • link updated 5.0 years ago by Ram 44k • written 9.1 years ago by Endre Bakken Stovner ▴ 970

1

Entering edit mode

God pandas is the greatest thing ever...

ADD REPLY • link 9.1 years ago by Endre Bakken Stovner ▴ 970

0

Entering edit mode

That is beautiful.

ADD REPLY • link 9.1 years ago by ebrown1955 ▴ 320

0

Entering edit mode

Since str.contains takes regexes you can do `df.Definition.str.contains("development|morpholo", flags=re.IGNORECASE)`.

ADD REPLY • link 9.1 years ago by Endre Bakken Stovner ▴ 970

0

Entering edit mode

Thanks a lot! I saw you could get mySql GO databases, but if there is a python library it will be way easier.

ADD REPLY • link 9.1 years ago by cyril-cros ▴ 950

score 1 · Answer 3 · 2015-10-16

HMMER2GO* has a not-so-obvious option that will allow you to search for Pfam models by term and generate an HMM database in one go. For example,

$ hmmer2go pfamsearch -t development -o dev.out -d 
Found 722 HMMs for development in 5 database(s). HMMs can be found in the directory: development_hmms.

The "-d" says to build an HMM database from the results (a single HMM based on the search terms). The "dev.out" is just a summary of what was found. Here is what it looks like:

$ head dev.out
Accession    ID    Description
PF10798    YmgB    Biofilm development protein YmgB/AriR
PF02365    NAM    No apical meristem (NAM) protein
PF06133    Com_YlbF    Control of competence regulator ComK, YlbF/YmcA
PF06732    Pescadillo_N    Pescadillo N-terminus
PF07572    BCNT    Bucentaur or craniofacial development
PF10221    DUF2151    Cell cycle and development regulator
PF10539    Dev_Cell_Death    Development and cell death domain
PF10719    ComFB    Late competence development protein ComFB
PF11754    Velvet    Velvet factor

Note: This search term is really broad and will likely return more than what you interested in. This is usually the case. The way I use this is to search without the "-d" option and look at the results. Then, I refine my search and tell the command to construct the database when I am certain the search returns what I want. Next, just run HMMER and find your genes of interest. Benefits of this approach is that HMMER is sensitive and essentially as fast as BLAST now, but the lack of global alignment mode in HMMERv3 makes domain extraction more challenging (just fyi).

*full disclosure, I wrote this tool. I was always doing similar tasks and thought it might be help others save a little time.

edit: I forgot to mention that this program will find ORFs and run the HMMER search against your newly constructed database (there is a brief tutorial on github). Hope that helps.