Quantile Normalization in R
1
4
Entering edit mode
3.8 years ago
KVC_bioinfo ▴ 550

Hello All,

I have read counts from RNA seq data in row and columns. I want to quantile normalized them in R. I have following code. This gives me the normalized values. However, the output is a matrix. I want the output with row name and column name so that I can perform PCA on it.

data <- read.csv("data.csv",header=T)
data_mat <- as.matrix(data[,-1])
data_norm <- normalize.quantiles(data_mat, copy = TRUE)


Could someone help me to get that? Thank you in advance.

normalization quantile R Bioconductor • 12k views
0
Entering edit mode

Are you implying that your data_norm object has no row or column names after you perform quantile normalisaton? What about your data.csv file?

0
Entering edit mode

Yes exactly. data_norm object has no row or column names after I perform quantile normalization. However, data.csv has it.

5
Entering edit mode
3.8 years ago

Try this (note the extra line; also use data.matrix, not as.matrix):

data <- read.csv("data.csv",header=T)
rownames(data) <- data[,1]
data_mat <- data.matrix(data[,-1])
data_norm <- normalize.quantiles(data_mat, copy = TRUE)

1
Entering edit mode

It works. Thank you very much.

1
Entering edit mode

You're the best.

0
Entering edit mode

Hi Kevin, Can you tell me, I have 3 same tissue RNA-seq data and I have the readcounts of every gene from featureCounts and HTseq and Cufflinks. my question is what should be there in my data.csv file ( only the counts or gene list + counts). Thanks in advance.

0
Entering edit mode

featureCounts and HTseq produce raw counts; Cufflinks would have produced normalised counts, most likely by FPKM.

0
Entering edit mode

My question is what should be there in my input data.csv file for quantile normalization ( only the counts or gene list + counts). Thanks in advance.

My data.csv looks like :

sample1 sample2 sample3 sample4 sample 5

1000 250000 352 5425 5985

1533 54896 5482 6549 6464

1
Entering edit mode

It can be any numerical data, usually with samples as columns and genes/probes as rows. If you're attempting to normalise some RNA-seq counts by a standard quantile normalisation function, then I would not do that. You should use one of the published methods like EdgeR, DESeq2, or something else in order to perform the normalisation.

0
Entering edit mode

Hi Kavin,

I am too having this problem.

data=read.csv("bk.txt", sep="\t", header=T)
1      ABCG4                              1.17                               1.00
2 AP003391.1                              1.00                               1.00
3      ATP5L                            170.36                             200.45
4      BCL9L                             17.52                               1.74
5  BMPR1APS2                              1.04                               1.05
6     C2CD2L                              4.44                              11.20
rownames(data) <- data[,1]
data_mat <- data.matrix(data[,-1])
ABCG4                                   1.17                               1.00
AP003391.1                              1.00                               1.00
ATP5L                                 170.36                             200.45
BCL9L                                  17.52                               1.74
BMPR1APS2                               1.04                               1.05
C2CD2L                                  4.44                              11.20
data_norm <- normalize.quantiles(data_mat, copy = TRUE)
[,1]       [,2]      [,3]       [,4]       [,5]      [,6]       [,7]       [,8]
[1,]   1.316610   1.002034  1.002034   1.006864   1.201017  1.000169   1.316610   1.001017
[2,]   1.003051   1.002034  1.002034   1.006864   1.002034  5.781186   1.002034   1.001017
[3,] 219.738136 219.738136 87.607966 219.738136 219.738136 87.607966 219.738136 219.738136
[4,]  12.947627   1.983136  5.781186   1.201017   4.649492 19.805254   2.767627   5.781186
[5,]   1.201017   1.133051  1.316610   1.006864   1.002034  1.092881   1.002034   1.001017
[6,]   2.767627  25.918475 16.030169   4.649492  25.918475  2.150000  16.030169   2.767627


There is no rows and columns names in the output file. Can you figure out what is wrong with this? Appreciate your help.

0
Entering edit mode

I see that you have posted here? Quantile Normalization in R and output data

The colnames and rownames of data_norm are the same as data_mat

Traffic: 2493 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.