Popular Methods Or Tools To Determine Gene Families In A Newly Sequenced Genome?
2
2
Entering edit mode
13.1 years ago
Dejian ★ 1.3k

We are sequencing a genome and we want to find the gene families in the genome. I wonder whether it is easy to find all the gene families in a genome. Will you recommend some popular papers, methods, or tools dealing with identification of gene families in a genome? Thank you in advance for sharing your knowledge.

Added: To avoide ambiguity, please refer to gene family on wiki.

gene genome clustering • 7.2k views
ADD COMMENT
0
Entering edit mode

Can you add few examples of your gene families of interest ?

ADD REPLY
3
Entering edit mode
13.1 years ago

It depends a bit on the type of genome you are sequencing and what you mean by "gene families".

If you mean "protein domains", Pfam and InterPro are very good resources. In case of eukaryotic genomes, I would complement these with predictions from SMART. If what you are looking at a prokaryotic genome and you want something that is a bit more focused on full-length genes than protein domains, I would use the COG database.

ADD COMMENT
0
Entering edit mode

Thank you for your info. This is what I mean by gene family. http://en.wikipedia.org/wiki/Gene_family

ADD REPLY
0
Entering edit mode

Looking at a the wiki does not answer my point. All of the databases I suggest are families of homologous genes/proteins, which is all the definition on Wikipedia says. But when you say you want families of homologous, you have to decide how far back in evolution you want to look. If two genes originate from the same gene in the Last Universal Common Ancestor (LUCA), do you want them in the same family? Or do you want broader families than that? What about gene fusions? If one half of a gene is homologous to another gene whereas the other half has different origin, do you want them together?

ADD REPLY
3
Entering edit mode
13.1 years ago

Try OrthoMCL, CD-HIT or Inparanoid.

ADD COMMENT
0
Entering edit mode

I think these three tools are designed for identify orthologues. I have updated and clarified the definition of gene family. Thank you all the same.

ADD REPLY
0
Entering edit mode

It depends on how you apply these systems and what data you feed them. It is in fact probably wise to include (an) outgroup(s) in your gene family analysis and define both orthologs and paralogs, since without outgroup information your analysis will lack context and be limited in relevance.

ADD REPLY
0
Entering edit mode

Your reply promotes my understanding of gene families. The three tools will be a good help. Many thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6