Consensus Cluster takes normalized gene counts or raw gene counts?
1
0
Entering edit mode
7 months ago

Hi!

I am using package ConsensusClusterPlus in R to discover the optimal number of gene expression clusters. Following the steps [note: the code is pseudocode, just to help the understanding]:

1 Get the RNA SEQ data (rows: genes, cols: samples/patients)

2 Keep only the top 30% Most Variable Genes by MAD :

row_sds <-  apply(data, MARGIN = 1, mad) 
row_sds <- row_sds[order(row_sds, decreasing = TRUE)]
top_percentage <- 0.3
num_rows_to_keep <- ceiling(top_percentage * length(row_sds))
row_sds <- row_sds[1:num_rows_to_keep]
data <- data[names(row_sds), ]

3 Normalize expression per gene: sweep(data,1, apply(data,1,median,na.rm=T))

4 Apply method:

ConsensusClusterPlus(data.matrix(data),
  maxK=6,
  reps=50,
  pItem=0.8,
  pFeature=1,
  title=title,
  clusterAlg="hc",
  distance="pearson",
  seed=1262118388.71279,
  plot="png")

My question is on point 1 I have a normalized RNASEQ counts using Voom Limma pipeline (similar do DESEQ2) - normalizes data across samples enabling comparisons between samples. Should I pass to ConsensusPlus the RNASEQ counts or the normalized counts?

Best Regards and Thank you,
Manuel

R Gene-Expression ConsensusCluster DGE • 611 views
ADD COMMENT
2
Entering edit mode
7 months ago
bk11 ★ 2.4k

You should pass normalized counts. In the ConsensusClusterPlus (Tutorial), you can see it is using ALL data from Package ‘ALL’. ALL Package contains the microarray data from 128 different individuals with acute lymphoblastic leukemia. These data have been normalized (using rma). Please check out page 2 of this file linked here-

https://bioconductor.org/packages/release/data/experiment/manuals/ALL/man/ALL.pdf

ADD COMMENT

Login before adding your answer.

Traffic: 2773 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6