Question

Number Of Species Is Restricted By Genome Combinatorics?

0

Entering edit mode

10.8 years ago

a1ultima ▴ 850

Me and some colleagues are interested in opinions for the following:

Conjecture

The maximum number of species must be limited by the maximum combinatorial/permutational space that can be occupied by DNA. Given there is some maximum possible physical genome size attainable by life.

Explanation

E.G. say maximum number of DNA base pairs able to fit in a genome was 3, each base pair can be one of either {A,G,T,C}. Then there are 4^3 = 64 possible combinations of genomes. Extrapolate to genome sizes of x base pairs, then there are 4^x combinations.

Question

Would it be possible to claim that the underlying "blueprint" that codes for living diversity sets the absolute maximum for the total "diversity space"?
I.e. does it make sense to define the total number of species life can achieve with the simple function: S < 4^x, where x is the maximum genome size measured in DNA base pairs?

genome • 2.5k views

ADD COMMENT • link updated 5.3 years ago by Biostar 20 • written 10.8 years ago by a1ultima ▴ 850

1

Entering edit mode

A. I think you should sum it over all x in a reasonable range (1M to 10G maybe) B. I wonder if epigenetics will have an influence, can two species differ only due to different methylation pattern for instance? I think that theoretically it's possible.

ADD REPLY • link 10.8 years ago by Asaf 10k

1

Entering edit mode

In addition to what Asaf mentioned, one could also imagine some sort of maternal effect based species differences (in a way, this would also be an epigenetic effect, if one uses a very broad meaning of the term). I'm not aware of any such thing ever being found, but one could theorize it.

Also, something to think about is how the underlying sequence is translated into protein. Recalling that not every nucleotide change in a coding region will result in an amino acid change (though other important things may be changed by that!), one might guess that the actual limit is a bit lower. Having said that, when you consider that you can engineer novel codons in, perhaps that's no longer the case.

And then there's ploidy to consider...

ADD REPLY • link 10.8 years ago by Devon Ryan 104k

Ram · Answer 1 · 2014-01-02

I imagine the answer to this would be complicated a great deal by details like reading frame, codon set, strandedness and how proteins constrain the definition of species, as well as redundancy issues.

I don't think you can do a one-to-one mapping of a combination to a species.

With respect to strandedness, each one of the 4^x combinations also specifies its reverse complement (assuming you don't consider single-stranded viral DNA). With the exception of palindromes, a given combination specifies two sequences. Thus, there are generally at most (4^x)/2 combinations that result in unique sequences, minus palindromes, I think.

Reading frames and codon redundancy mean that you get many more potential proteins out of a given base combination. For instance, as a simple case, a four-base sequence gives potentially four reading frames: two for the forward strand and two for the reverse strand, not accounting for base and codon redundancy. This increases the number of unique "proteins" (or chains of amino acids) that could result:

combination: { AGGT }
 -> 
sequences: { { 5'-AGGT-3' }, { 3'-TCCA-5' } }
 ->
mRNA: { { 5'-ACCU-3' }, { 5'-AGGU-3' } }
 -> 
tRNA codons: { { 5'-GGU-3' }, { 5'-AGG-3' }, { 5'-ACC-3' }, { 5'-CCU-3' } }
 ->
"proteins": { { Gly }, { Arg }, { Thr }, { Pro } }

However, some sets of base combinations will be redundant with respect to the codons they code for, as well as the codons themselves. So that reduces the number of reading frames that encode unique "proteins". Consider the following combination, which also has four bases but only encodes two proteins:

combination: { AAAA }
 ->
sequences: { { 5'-AAAA-3' }, { 3'-TTTT-5' } }
 ->
mRNA: { { 5'-TTTT-3' }, { 5'-UUUU-3' } }
 ->
tRNA codons: { { 5'-UUU-3' }, { 5'-AAA-3' } }
 ->
"proteins": { { Phe }, { Lys } }

Consider also the set of combinations of five bases - and more - and you can see how this could get complicated, quick.

Proteins define the functionality of the genetic entity and its ability to exchange genomic data with another entity, which is what defines those two entities as members of the same species.

I think you'd need to tighten up how you define species - and how bases are translated to proteins ("genetic code") that define species - to say if two combinations would yield different species.

For instance, if two base combinations make the same set of proteins, aren't they essentially the same species? But also consider the case where two base combinations share 99.9% of their potential protein output - are those two combinations equivalent or different species? Where would you set the cut-off etc.?