[R] Unsupervised hierarchical clustering RNA-seq ?
1
0
Entering edit mode
2.0 years ago
ali ▴ 20

Hi everyone , I recently finished to replicate a work made on RNA-seq data on genes involved in Tumor Educated Platelet. Now at the very end of the article (from which I replicated the analysis) they mention the following : Unsupervised hierarchical clustering was performed by Ward clustering and Pearson distances. Non-random partitioning, and corresponding p value, of unsupervised hierarchical clustering was determined using a Fisher’s exact test. Now I was wondering how do I compute this final part? What I have is a count matrix ( non-normalized and normalized) with Cancer and Control samples (285 in total) as columns and 5000 genes as rows, and a factor of 2 ( 1 = tumor , 0 = HC ). I have another factor too of 6 that specifies even the specific tissue of the tumor. For what I got this is useful in order to give more significance to the clustering , but I really don't know how to do this in R.

RNA-seq R fisher Counts • 1.7k views
ADD COMMENT
0
Entering edit mode

Ali, check this tutorial on unsupervised gene expression clustering, you may find answers to most of your questions by walking through that post.

ADD REPLY
0
Entering edit mode

Tnx I will give it a look.

ADD REPLY
0
Entering edit mode
2.0 years ago

I can't help you with the second part, but this is at least some sample code which demonstrates a simple hierarchical clustering in R:

sampledata <- matrix(runif(1e3,0,100)^2,ncol=10,nrow=100)
colnames(sampledata) <- paste("Sample",LETTERS[1:10],sep="_")

  library("stats")

  thematrix <- as.matrix(t(sampledata))

  # calculate distance
  thematrix.dc <- dist(thematrix, method="euclidian")
  thematrix.hc <- hclust(as.dist(thematrix.dc), method="ward.D2")

  cpm_cluster <- list()
  cpm_cluster[["hc"]] <- thematrix.hc
  cpm_cluster[["ord"]] <- rownames(thematrix)[order.dendrogram(as.dendrogram(thematrix.hc))]

  library("ape")
  library("ggplot2")
  library("ggtree")
  library("grid")
  library("gridExtra")

  layout <- c('rectangular','slanted','fan','circular','radial','unrooted')

  mylayout <- layout[layout.style]
  if(is.na(mylayout)){mylayout <- layout[1]} # fallback in case invalid layout was chosen

  # convert to phylo class tree (ape)
  # build tree with layout
  myphylo <- as.phylo(cpm_cluster[["hc"]])
  mytree <- ggtree(myphylo,
                   layout=mylayout,
                   ladderize=FALSE,
                   color="black")

  mytree[["data"]][["label"]] <- gsub("_"," ",mytree[["data"]][["label"]])

  mytree <- mytree + geom_tiplab(aes(label=label),color="black", size=3, vjust=-0.2, hjust=1.15)
  mytree <- mytree + geom_tippoint(size=3.5,shape=15) + geom_nodepoint(color="black",size=2)
  mytree <- mytree + scale_color_brewer("Group",palette = "Set3") 

  print(mytree)

I have to admit that I only know Pearson correlation, but not Pearson distances, so in this example Euclidean distances are used. Have a look at tree annotation if you like to color your tips according to the tumor/HC groups.

ADD COMMENT
0
Entering edit mode

Tnx i will try it and let you know.

ADD REPLY

Login before adding your answer.

Traffic: 1812 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6