Percentage Of Genes Involved In Metabolome
1
2
Entering edit mode
11.6 years ago
GR ▴ 400

Hi All,

Just a quick question. How much percentage of genes in a genome are involved in metabolic activity?

• 1.8k views
ADD COMMENT
5
Entering edit mode

This is an extremely broad question. The percentage will differ depending on organism and how you define metabolic activity.

ADD REPLY
5
Entering edit mode
11.6 years ago

As noted in the comments, an extremely broad question. The Gene Ontology defines 'Metabolic Process' (GO:0008152) as:

The chemical reactions and pathways, including anabolism and catabolism, by which living organisms transform chemical substances. Metabolic processes typically transform small molecules, but also include macromolecular processes such as DNA repair and replication, and protein synthesis and degradation.

The GO term 0008152 is the ancestor of 22 child terms, many of which are also incredibly broad... I suppose one could identify the complete list of genes associated with all of these terms (or a subset selected to suit your own perhaps narrower definition of 'metabolic activity'). The resulting list of unique genes could be compared to the total list of distinct genes in GO on a species-by-species basis to get a general answer to your question.

If you decide to pursue this route you can download taxon specific files here: current annotations

GO terms for Metabolic Process and all child terms:

GO:0008152 GO:0009058 GO:0009056 GO:0044237 GO:0070988 GO:0042445 GO:0043170 GO:0032259 GO:0044033 GO:0044236 GO:0009892 GO:0006807 GO:0071704 GO:0019637 GO:0055114 GO:0042440 GO:0009893 GO:0044238 GO:0019222 GO:0045730 GO:0019748 GO:0044281 GO:1901275

Getting some basic counts like so:

zcat gene_association.goa_human.gz | cut -f 3 | sort | uniq | wc -l
zcat gene_association.goa_human.gz | grep -P "GO:0008152|GO:0009058|GO:0009056|GO:0044237|GO:0070988|GO:0042445|GO:0043170|GO:0032259|GO:0044033 |GO:0044236|GO:0009892|GO:0006807|GO:0071704|GO:0019637|GO:0055114|GO:0042440|GO:0009893|GO:0044238 |GO:0019222|GO:0045730|GO:0019748|GO:0044281|GO:1901275" | cut -f 3 | sort | uniq | wc -l

for an arbitrary selection of species gives estimates like this:

E. coli = 607/3861 (15.7%), Fly = 683/13818 (4.9%), Cow = 91/18840 (4.8%), Human = 1423/18930 (7.5%), Mouse = 1806/25479 (7.1%), Rat = 537/20608 (2.6%), Yeast = 690/6407 (10.8%), Arabidopsis = 2456/30308 (8.1%), C. elegans = 1005/16120 (6.2%)

There are many caveats to this approach. These are likely to be considerable underestimates in my opinion. Both the final numbers and variability from species to species are more likely to be a reflection of the incompleteness of the Gene Ontology project, the way the starting files were created for each species, etc. rather than a reflection of the relative importance of metabolism in the overall repertoire of genes for each species...

ADD COMMENT
0
Entering edit mode

Thanks Griffith for the wonderful explanation. Indeed very helpful.

ADD REPLY

Login before adding your answer.

Traffic: 1901 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6