Question: What is the normalization w.r.t to gene expression counts or values?
0
gravatar for murali
4.2 years ago by
murali90
Germany
murali90 wrote:

What is the normalization w.r.t to gene expression counts or values?

 

What is the difference between FPKM,RPKM normalization and gene expression counts normalization?

ADD COMMENTlink modified 3.8 years ago by Biostar ♦♦ 20 • written 4.2 years ago by murali90
1

Your first sentence doesn't quite parse. Are you asking what the normalization methods are? BTW, there's more than one way to generate FPKM/RPKM values, one of which uses normalized counts (as opposed to the original method that should never be used).

ADD REPLYlink written 4.2 years ago by Devon Ryan89k

@Devon Ryan, Yes .I want to get overview of this normalization concept w.r.t  gene expression counts.

Can you please suggest me any paper related to this topic.

ADD REPLYlink written 4.2 years ago by murali90

aha, w.r.t means "with respect to"... now I get it

ADD REPLYlink written 4.0 years ago by Martombo2.4k
7
gravatar for Martombo
4.0 years ago by
Martombo2.4k
Seville, ES
Martombo2.4k wrote:

it's really quite simple: the first quantity that was used to measure gene expression in RNA-seq was the RPKM (Reads Per Kilobase per Million of reads). This was just meant to give an idea of the reads density in a certain region. 1RPKM is the read density that you would get over a region of 1 KB if you sequenced 1 Million of reads in your experiment. This then would ideally allow you to compare this value between different samples. What it has been observed is that this measure shows a certain bias under specific situations (see slides) and therefore it's not well suited for such a comparison nor for a differential expression analysis. This concept was extended to FPKM (Fragments Per Kilobase per Million of reads) once the paired-end technique was developed. So just use RPKM with single-end reads and FPKM with paired-end reads. They are computed in the same way.

Normalized counts are instead floating point numbers that are produced by a normalization method, like the size-factor in DESeq2, which then permits a comparison of these values between different samples. In this case the bias that affects RPKM is not there. This is a bit more complicate to understand: the counts of all the genes in every samples are divided by the geometric mean of each gene across all the samples. The median value of this measure of all the genes in one sample, determines the size factor of that sample. More simply, imagine you have only two samples and in the second one you find the double of the counts for most of the genes (or the median fold-change is 2). Then with this normalization, you say: well that will probably mean that the second sample has the double of the sequencing depth of the first one, then if I want to compare the two I will have to divide the counts of the second sample by 2.

I happen to have some slides also on this topic: https://www.dropbox.com/s/yewqwpfzl0ay3ta/normalization.pdf?dl=0

Every stack of the bar plot represents the number of counts of one gene. In this example you have 1.5 more depth in condition B compared to condition A. Only one gene (red) is differentially expressed. With the RPKM normalization all the other genes will look downregulated in the second sample (which is wrong). The size-factor normalization can instead produce values which are stable for all the genes.

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Martombo2.4k

hi Martombo.. liked the way of your description for normalization.

ADD REPLYlink written 4.0 years ago by murali90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 822 users visited in the last hour