Hello all
I have a dataset of z-scores of cell viability derived from siRNAs targetting 792 genes in a panel of 4x p53 WT and 6x p53 mutant breast cell lines.
What I am trying to do is work out which genes are differentially required between the p53 WT and p53 mutant cell lines.
I want to find the top 50 genes that lead to a loss of viability in p53 mutant cell lines compared to the p53 WT cell lines and display these in a heatmap that clusters p53 WT cell lines together and p53 mutant cell lines together.
The data I have is in a csv file and looks like this when in excel:
Row 1 is p53 mutational status Row 2 is cell line name Row 3 onwards is gene name followed by the z-scores for each cell line
I have been trying to do this using heatmaps in R but as I have no background in this, I am getting nowhere. I have tried to make a data matrix but the problem I seem to have is that I have 2 column headers (p53 mutation status row 1, cell line name row 2).
Getting rid of the cell line names (row 2) might make it simpler so columns are labelled either "p53 WT" or "p53 mutant"
Any comments on how best to determine the top 50 differentially required genes would be gratefully recieved. If anyone would be able to guide me through how to do this using R that would be super.
Many thanks in advance.
Luke
Do you mean you have calculated column wise z-score ?? if yes, then it does not make sense. You should have calculated z-score across the samples (row-wise zscore). Correct me if I am wrong.
However, I plotted density plot for each of your sample and they all are centred at 0. Therefore, it suggest z-score calculation is column wise and not row wise. See the plot below.
I also generated heatmap using all genes with the values you gave in the file. randomly I chose 10 clusters (kmeans) and separate them in 10 groups. Look at the code and heatmap. See, if it make sense.
can you share subset dataset ? you can hide genenames if you dont want to share confidentiality
Dear Chirag
Many thanks for your speedy reply, here is the data set:
https://ufile.io/fknis
The data you uploaded have duplicated gene names
‘ATM’, ‘BMPR1A’, ‘BUB1B’, ‘CAMKK1’, ‘CDKN2A’, ‘CHEK2’, ‘FLJ23356’, ‘KIAA1811’, ‘MAP2K4’, ‘MGC4796’, ‘PIK3R1’, ‘PRKAR1A’, ‘SSTK’, ‘STK11’, ‘STK22D’
. I wonder how would you differentiate between them.