Why Do We Normalize For Gc Counts And Not For At Counts When Computing Gene Expression In Rna-Seq Data
3
0
Entering edit mode
11.0 years ago

I'm very new to genetic and I'm afraid I have the most basic question.

Why do we normalize for GC counts and not for AT counts? I know that GC establishes 3 hydrogen bounds and AT only two. The stronger the bound, the easier to determine where it is/how many there are? Is this it?

Thanks in advance!

gc counts • 7.5k views
ADD COMMENT
1
Entering edit mode

"Why do we normalize for GC counts" : 'normalize' for what ? expression ? ngs ? ...

ADD REPLY
0
Entering edit mode

My bad. Yes, gene expression in RNA-seq. Libraries used to normalize for data (like EDASeq) normalizes for GC content. I'm just curious why GC and not AT.

ADD REPLY
5
Entering edit mode
11.0 years ago
Michele Busby ★ 2.2k

The bigger answer is that there can be bias in the RNA Seq data as an artifact of the GC content of the transcripts.

This is an extreme example using data from our paper Adiconis et al. 2013 on different methods for degraded or low quantity RNA-Seq. These are the read counts per transcript for identical samples prepared with two methods. One of the methods, DSN-Lite seems to be counting fewer reads for high-GC transcripts than the other methods. So if you use this method you have to account for this in downstream analyses.

But these sorts of biases is why you see the normalization by GC. enter image description here

ADD COMMENT
1
Entering edit mode
11.0 years ago

It's just a convention. In most applications normalizing for higher coverage that correlates to GC regions is essentially the same as normalizing for lower coverage in AT regions. The point is that base composition correlates with some parameter that's being adjusted.

ADD COMMENT
0
Entering edit mode

Thank you Chris!

ADD REPLY
1
Entering edit mode
11.0 years ago
ugly.betty77 ★ 1.1k

Start with a cage containing five monkeys.

Inside the cage, hang a banana on a string and place a set of stairs under it. Before long, a monkey will go to the stairs and start to climb towards the banana. As soon as he touches the stairs, spray all of the other monkeys with cold water.

After a while, another monkey makes an attempt with the same result - all the other monkeys are sprayed with cold water. Pretty soon, when another monkey tries to climb the stairs, the other monkeys will try to prevent it.

Now, put away the cold water. Remove one monkey from the cage and replace it with a new one. The new monkey sees the banana and wants to climb the stairs. To his surprise and horror, all of the other monkeys attack him.

After another attempt and attack, he knows that if he tries to climb the stairs, he will be assaulted.

Next, remove another of the original five monkeys and replace it with a new one. The newcomer goes to the stairs and is attacked. The previous newcomer takes part in the punishment with enthusiasm! Likewise, replace a third original monkey with a new one, then a fourth, then the fifth. Every time the newest monkey takes to the stairs, he is attacked.

Most of the monkeys that are beating him have no idea why they were not permitted to climb the stairs or why they are participating in the beating of the newest monkey.

After replacing all the original monkeys, none of the remaining monkeys have ever been sprayed with cold water. Nevertheless, no monkey ever again approaches the stairs to try for the banana. Why not? Because as far as they know that's the way it's always been done round here.

Is that a good answer? :)

There is no difference between choosing AT ratio and choosing GC ratio, because their sum is 1. So, the choice is merely based on convention.

BTW, the monkey experiment was actually done by a researcher in 1967.

http://wiki.answers.com/Q/Did_the_monkey_banana_and_water_spray_experiment_ever_take_place

ADD COMMENT

Login before adding your answer.

Traffic: 1789 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6