Question: Can't find a pattern in heatmap, is it okay?
0
gravatar for Tania
14 months ago by
Tania120
Tania120 wrote:

Hi all

Does this heatmap look okay to you? Why I can't see any pattern ? https://ibb.co/m8qaQx Here is a snapshot of my code:

cts <- txi$counts
group = factor(c(rep("Control", 10), rep("Tumor",10)) )
dge = DGEList(counts=cts, genes= rownames(cts), group=group)
countsPerMillion <- cpm(dge)  
summary(countsPerMillion)
countCheck <- countsPerMillion > 1
summary(countCheck)
keep <- which(rowSums(countCheck) >= 2)
dge <- dge[keep,]
summary(cpm(dge))
dge <- calcNormFactors(dge, method="TMM")
dge <- estimateCommonDisp(dge)
dge <- estimateTagwiseDisp(dge)
dge <- estimateTrendedDisp(dge)
et <- exactTest(dge, pair=c("Control", "Tumor"))
etp <- topTags(et, n= 100000, adjust.method="BH", sort.by="PValue")
#################################################################
###PLOTTING STARTS HERE
#################################################################

logCPM = countsPerMillion
o = rownames(etp$table[abs(etp$table$logFC)>1 & etp$table$PValue<0.05, ])
logCPM <- logCPM[o[1:100],]
colnames(logCPM) = labels
logCPM <- t(scale(t(logCPM)))
write.csv(logCPM, "ControlTumorCPM.csv")
require("RColorBrewer")
require("gplots")
myCol <- colorRampPalette(c("white", "darkgreen", "red"))(100)
myBreaks <- seq(-3, 3, length.out=101)
heatmap.2(logCPM, col=myCol, breaks=myBreaks, Rowv=TRUE,Colv=TRUE, main="Controls vs Tumors Heatmap", key=T, keysize=0.7,scale="none",trace="none", dendrogram="both", cexRow=0.2, cexCol=0.9, density.info="none",margin=c(10,9), lhei=c(2,10), lwid=c(2,6),reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean),  distfun=function(x) as.dist(1-cor(t(x))), hclustfun=function(x) hclust(x, method="ward.D2"))
dev.off()

Thanks

heatmap rna-seq • 641 views
ADD COMMENTlink modified 14 months ago • written 14 months ago by Tania120
2

Your changes are imbalanced (there's almost no downregulated genes). Could you confirm that the data have been log-transformed and/or row-scaled. I'd strongly recommend you pick a colourscheme that will highlight differences between samples: I tend to avoid green-red anyway, but it seems a bit perverse to use green for unchanged, red for upregd and white for downregd - I'd have suggested using white as the midpoint

ADD REPLYlink written 14 months ago by russhh4.3k

Thanks so much. I think I did the logscale but not sure about the rowscale?, here is a snapshot of the code, we have down regulation, but most of the top significant genes are up regulated not down regulated, I am plotting the top 100?

logCPM = countsPerMillion
o = rownames(etp$table[abs(etp$table$logFC)>1 & etp$table$PValue<0.05, ])
logCPM <- logCPM[o[1:100],]
colnames(logCPM) = labels
logCPM <- t(scale(t(logCPM)))
write.csv(logCPM, "ControlTumorCPM.csv")
require("RColorBrewer")
require("gplots")
myCol <- colorRampPalette(c("white", "darkgreen", "red"))(100)
myBreaks <- seq(-3, 3, length.out=101)
heatmap.2(logCPM, col=myCol, breaks=myBreaks, Rowv=TRUE,Colv=TRUE, main="Controls vs Tumors Heatmap", key=T, keysize=0.7,scale="none",trace="none", dendrogram="both", cexRow=0.2, cexCol=0.9, density.info="none",margin=c(10,9), lhei=c(2,10), lwid=c(2,6),reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean),  distfun=function(x) as.dist(1-cor(t(x))), hclustfun=function(x) hclust(x, method="ward.D2"))
dev.off()
ADD REPLYlink modified 14 months ago • written 14 months ago by Tania120
1

Could you put your code into your original question please, so it's a bit more readable

ADD REPLYlink written 14 months ago by russhh4.3k

I changed it, thanks

ADD REPLYlink written 14 months ago by Tania120

rerun it with the line logCPM <- log2(countsPerMillion)

ADD REPLYlink written 14 months ago by russhh4.3k

The above line throws the following error,

Error in hclust(x, method = "ward.D2") : NA/NaN/Inf in foreign function call (arg 11)

So I changed this line instead:

countsPerMillion <- cpm(dge, prior.count=2, log=TRUE)

Then it becomes like this: https://ibb.co/nCNsoH

Do you think it is fixed now? Thanks

ADD REPLYlink written 14 months ago by Tania120
1

Yes it looks ok.

I'd rewrite your script now that it works, so that your variable names reflect what they actually hold; so use logCPM instead of countsPerMillion if you're storing the logged values etc

Please note that you are filtering on p-values rather than FDR, so your results aren't very stringent

ADD REPLYlink written 14 months ago by russhh4.3k

Thanks russhh so much :)

ADD REPLYlink written 14 months ago by Tania120
1

Are you expecting to see one? Why?

Heatmaps are just a visualisation tool, not really an analysis tool.

Besides, there's a marked difference between your tumor and WT control as far as I can tell. YOu should probably do an ontology exploration to figure out the story behind your data.

ADD REPLYlink written 14 months ago by jrj.healey11k

Thanks so much. In the data most of the significant genes, are upregulated not down, I plot only the top 100 in the heatmap.

ADD REPLYlink written 14 months ago by Tania120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1076 users visited in the last hour