Why genome seq A% ~= T% and G% ~= C%
3
2
Entering edit mode
7.3 years ago
fengbo ▴ 30

I test 3 genome. Human, Yeast, E.coli All of them has the same result, G% = C%, A% = T%

result: https://github.com/zengfengbo/data

genome • 1.7k views
ADD COMMENT
0
Entering edit mode

Genome Seq is a single DNA.

ADD REPLY
5
Entering edit mode
7.3 years ago

Isn't your finding the Chargaff's rules?

ADD COMMENT
1
Entering edit mode

Yup. I went years as a geneticist thinking it must be because C-G A-T, as the two posters below wrote, but that's not the case, or at least it's only half of the reason. The other reason is that when DNA moves around, it has a random insertion direction.

ADD REPLY
1
Entering edit mode
7.3 years ago

If on one strand of the genome you have about the same number of G and C, that means that the number of G on the + strand is about the same as the number of G on the - strand.

Now imagine that we start with a genome with more of G on the + strand. My guess is that due to multiple events of chromosome rearrangement (inversions) during evolution, the number of G is slowly reequilibrated until you have about the same number of G on both strands, meaning that you have about the same number of C and G on the same strand.

intial genome : 
GGGGAGGGAGTGGCGGGGG (+)       -> 1C and 15G
CCCCTCCCTCACCGCCCCC (-)

inversion -> toward equilibrium
AGGGG | CCCTCACCGCCCCC (+)    -> 11C and 4G  
TCCCC | GGGAGTGGCGGGGG (-) 

inversion -> toward equilibrium
AGGGGCCCTCACCGC | GGGG (+)    -> 7C and 9G  
TCCCCGGGAGTGGCG | CCCC (-)
ADD COMMENT
1
Entering edit mode

5'GGGGA3'\ /5'CCCTCACCGCCCCC3' (+) -> 11C and 4G
3'CCCCT5'/ \3'GGGAGTGGCGGGGG5' (-)

How A-3' linked with 3'-G?

ADD REPLY
0
Entering edit mode

You are right, I forgot to reverse the fragments. Will edit later. EDITED.

ADD REPLY
0
Entering edit mode
7.3 years ago
beausoleilmo ▴ 580

I would guess that this is because A is always paired with T and that C is always paired with G: https://en.wikipedia.org/wiki/DNA

Adenine pairs with thymine and guanine pairs with cytosine. It was represented by A-T base pairs and G-C base pairs

It might not perfectly match because there might be some imperfect sequencing or something similar.

ADD COMMENT
1
Entering edit mode

If you take into account both strands of the genome, then your reasoning is correct and we would expect to have exactly the same number of G and C because of base pairing. However the answer might not be so simple because I think fengbo is looking at only one strand of the genome.

ADD REPLY

Login before adding your answer.

Traffic: 1807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6