Question

Rank Normalization of genes in Expression Data

0

Entering edit mode

8.7 years ago

Ron ★ 1.2k

Hi all,

I want to do RANK based normalization in RNA seq expression data,basically ranking genes in a sample according to FPKMS and dividing by the total number of genes to get a normalized expression value between 0 and 1.

This has to be done across multiple samples.Is there any method to do this?

I have used this method,seems to work,but want to look at rank based method.

gene_min=apply(df, 1, min)
gene_max=apply(df, 1, max)
df_norm=(df-gene_min)/(gene_max-gene_min)

Thanks,
Ron

RNA-Seq R • 15k views

ADD COMMENT • link updated 18 months ago by Ram 43k • written 8.7 years ago by Ron ★ 1.2k

0

Entering edit mode

Hi Ron, what have you tried. A very simple method will be to apply a small filter (threshold) on the FPKM values, scale the genes between 0-1 and then just sort, very easy to do in R. Let us know, if you get stuck, I can write a function for you.

ADD REPLY • link 8.7 years ago by Sukhi Singh 11k

0

Entering edit mode

I have added the method I am using,which is a bit different although it scales values between 0 and 1.

ADD REPLY • link 8.7 years ago by Ron ★ 1.2k

score 4 · Accepted Answer · 2015-08-04

4

Entering edit mode

8.7 years ago

Kamil ★ 2.3k

Perhaps you're trying to do something like this?

> mat
          1        2        3        4        5
2  6.890809 6.744169 6.642575 6.649212 6.756785
9  4.303356 4.250599 4.245089 4.193621 4.471561
10 3.739968 3.823797 3.942015 3.850949 3.699985
12 8.237043 8.233598 8.315632 8.354951 8.472915
13 3.051626 2.983962 2.997821 3.017578 2.966767

> apply(mat, 2, function(y) rank(y) / length(y))

     1   2   3   4   5
2  0.8 0.8 0.8 0.8 0.8
9  0.6 0.6 0.6 0.6 0.6
10 0.4 0.4 0.4 0.4 0.4
12 1.0 1.0 1.0 1.0 1.0
13 0.2 0.2 0.2 0.2 0.2