Question

Determining differentially required genes between p53 WT or mutant cell lines from siRNA data and making a heatmap

0

Entering edit mode

5.8 years ago

lukewilliams22 • 0

Hello all

I have a dataset of z-scores of cell viability derived from siRNAs targetting 792 genes in a panel of 4x p53 WT and 6x p53 mutant breast cell lines.

What I am trying to do is work out which genes are differentially required between the p53 WT and p53 mutant cell lines.

I want to find the top 50 genes that lead to a loss of viability in p53 mutant cell lines compared to the p53 WT cell lines and display these in a heatmap that clusters p53 WT cell lines together and p53 mutant cell lines together.

The data I have is in a csv file and looks like this when in excel:

https://ibb.co/nbTMo0

Row 1 is p53 mutational status Row 2 is cell line name Row 3 onwards is gene name followed by the z-scores for each cell line

I have been trying to do this using heatmaps in R but as I have no background in this, I am getting nowhere. I have tried to make a data matrix but the problem I seem to have is that I have 2 column headers (p53 mutation status row 1, cell line name row 2).

Getting rid of the cell line names (row 2) might make it simpler so columns are labelled either "p53 WT" or "p53 mutant"

Any comments on how best to determine the top 50 differentially required genes would be gratefully recieved. If anyone would be able to guide me through how to do this using R that would be super.

Many thanks in advance.

Luke

R • 1.8k views

ADD COMMENT • link updated 5.7 years ago by Biostar 20 • written 5.8 years ago by lukewilliams22 • 0

1

Entering edit mode

followed by the z-scores for each cell line

Do you mean you have calculated column wise z-score ?? if yes, then it does not make sense. You should have calculated z-score across the samples (row-wise zscore). Correct me if I am wrong.

However, I plotted density plot for each of your sample and they all are centred at 0. Therefore, it suggest z-score calculation is column wise and not row wise. See the plot below.

library(tidyverse)
data <- read_csv("./data/breast-panel1.csv") %>% dplyr::rename( geneName = X1)

## I have change column names of your data see below.
data
# A tibble: 793 x 11
   geneName `WT_HTB-22` `WT_HTB-123` `WT_HTB-25` `WT_CRL-1897` MUT_HCC1187 MUT_HCC1419 MUT_HCC1954 MUT_HCC2157 MUT_HCC2998 `MUT_MFM-SS3`
   <chr>          <dbl>        <dbl>       <dbl>         <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>         <dbl>
 1 ZNFN1A1       0.876         0.573       1.04          0.897       2.49     1.51          1.74         1.89        1.64          2.38 
 2 ZAP70        -0.170        -0.704       1.14         -0.504      -2.97    -1.80         -1.43        -2.54       -1.92         -1.74 
 3 ZAK          -5.55         -2.59       -2.83         -6.57       -0.429   -0.977        -1.56         0.148      -0.616        -1.62 
 4 YES1         -1.67         -0.884      -1.34         -0.909      -0.328    0.0669        0.0935      -0.173       0.793        -0.603
 5 XYLB         -1.19         -2.39       -0.224        -1.30        0.488    0.586        -0.451        0.605       0.752         0.584
 6 XPC          -0.0855        0.149       0.979         1.95       -1.35    -1.41         -2.33        -1.55       -0.738        -0.320
 7 XPA          -0.550        -1.28       -3.33         -0.342       1.12    -0.000140      2.53         0.386       1.68          2.91 
 8 WT1          -0.509        -0.538      -0.383        -2.55        1.26     2.01          0.675        0.679       1.62         -0.215
 9 WRN          -2.75         -2.26       -4.36         -0.690      -0.270    0.240        -1.07        -0.635       0.420        -0.422
10 WNK4         -4.95         -1.38       -2.36         -3.72       -1.52    -0.0588       -1.71        -0.303       0.587        -1.08 
# ... with 783 more rows

## data cleaning and prepare it for heatmap and other plots ready format  
## if names are duplicated change it 
data2 <- data %>% group_by(geneName) %>% 
        mutate(n = row_number()) %>%
        mutate(geneName2 =  map(n  , function(.){ifelse( .  > 1 , paste(geneName , . ,sep = "_") , geneName)})) %>% unnest() %>% 
        ungroup() 
data2  %>% group_by(geneName2) 

data3 <- data2 %>% 
        mutate(geneName = geneName2) %>% 
        select(-geneName2 , -n ) %>% # remove un wanted columns 
        gather(key = "sampleName" , value = "z_score" , -geneName) %>% 
        separate(col = sampleName ,into = c("strain" , "cond") ,sep = "_") ## separate strain info(WT or MUT) from conditions 


## Now, data is ready. Proceed for plots 

## each sample density 
gp <- data3 %>% ggplot() + geom_density(aes(x = z_score , fill = cond), alpha = 0.5)  + theme_bw()
ggsave(gp ,filename = "biostar_query_density.png")

ADD REPLY • link 5.8 years ago by Chirag Parsania ★ 2.0k

1

Entering edit mode

I also generated heatmap using all genes with the values you gave in the file. randomly I chose 10 clusters (kmeans) and separate them in 10 groups. Look at the code and heatmap. See, if it make sense.

library(tidyverse)
data <- read_csv("./data/breast-panel1.csv") %>% dplyr::rename( geneName = X1)

## I have change column names of your data see below.
> data
# A tibble: 793 x 11
   geneName `WT_HTB-22` `WT_HTB-123` `WT_HTB-25` `WT_CRL-1897` MUT_HCC1187 MUT_HCC1419 MUT_HCC1954 MUT_HCC2157 MUT_HCC2998 `MUT_MFM-SS3`
   <chr>          <dbl>        <dbl>       <dbl>         <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>         <dbl>
 1 ZNFN1A1       0.876         0.573       1.04          0.897       2.49     1.51          1.74         1.89        1.64          2.38 
 2 ZAP70        -0.170        -0.704       1.14         -0.504      -2.97    -1.80         -1.43        -2.54       -1.92         -1.74 
 3 ZAK          -5.55         -2.59       -2.83         -6.57       -0.429   -0.977        -1.56         0.148      -0.616        -1.62 
 4 YES1         -1.67         -0.884      -1.34         -0.909      -0.328    0.0669        0.0935      -0.173       0.793        -0.603
 5 XYLB         -1.19         -2.39       -0.224        -1.30        0.488    0.586        -0.451        0.605       0.752         0.584
 6 XPC          -0.0855        0.149       0.979         1.95       -1.35    -1.41         -2.33        -1.55       -0.738        -0.320
 7 XPA          -0.550        -1.28       -3.33         -0.342       1.12    -0.000140      2.53         0.386       1.68          2.91 
 8 WT1          -0.509        -0.538      -0.383        -2.55        1.26     2.01          0.675        0.679       1.62         -0.215
 9 WRN          -2.75         -2.26       -4.36         -0.690      -0.270    0.240        -1.07        -0.635       0.420        -0.422
10 WNK4         -4.95         -1.38       -2.36         -3.72       -1.52    -0.0588       -1.71        -0.303       0.587        -1.08 



## data cleaning and prepare it for heatmap ready format  
## if names are duplicated change it 
data2 <- data %>% group_by(geneName) %>% 
        mutate(n = row_number()) %>%
        mutate(geneName2 =  map(n  , function(.){ifelse( .  > 1 , paste(geneName , . ,sep = "_") , geneName)})) %>% unnest() %>% 
        ungroup() 
data2  %>% group_by(geneName2) 

data3 <- data2 %>% 
        mutate(geneName = geneName2) %>% 
        select(-geneName2 , -n ) %>% # remove un wanted columns 
        gather(key = "sampleName" , value = "z_score" , -geneName) %>% 
        separate(col = sampleName ,into = c("strain" , "cond") ,sep = "_") ## separate strain info(WT or MUT) from conditions 


## Now, data is ready. Proceed for plots 

## preapare heatmap data. 
library(ComplexHeatmap)
for_hm <- data3 %>% 
        select(geneName, cond, z_score) %>% 
        spread(key = cond, value = z_score) %>%  
        as.data.frame(data3) %>%  
        column_to_rownames("geneName")

## heatmap column annotations 
hm_col_anno <- data3 %>% 
        select(strain , cond) %>% 
        unique() %>% 
        select(1) %>% 
        as.data.frame()
ha = HeatmapAnnotation(df = hm_col_anno)


## draw heatmap
set.seed(123)
hm <- Heatmap(for_hm , cluster_columns = F , 
              show_row_dend = T , 
              top_annotation = ha , 
              name = "Z_score" ,
              row_names_gp = gpar(fontsize = 7) , 
              column_names_gp  = gpar(fontsize = 35), 
              top_annotation_height = unit(30,"mm") , km = 10, 
              row_title_gp = gpar(fontsize = 40))

## save in the file 
png(file  = "biostar_query.png", height = 7000,width = 2000 ,res = 120)
draw(hm)
dev.off()

ADD REPLY • link 5.8 years ago by Chirag Parsania ★ 2.0k

0

Entering edit mode

can you share subset dataset ? you can hide genenames if you dont want to share confidentiality

ADD REPLY • link 5.8 years ago by Chirag Parsania ★ 2.0k

0

Entering edit mode

Dear Chirag

Many thanks for your speedy reply, here is the data set:

https://ufile.io/fknis

ADD REPLY • link 5.8 years ago by lukewilliams22 • 0

0

Entering edit mode

The data you uploaded have duplicated gene names ‘ATM’, ‘BMPR1A’, ‘BUB1B’, ‘CAMKK1’, ‘CDKN2A’, ‘CHEK2’, ‘FLJ23356’, ‘KIAA1811’, ‘MAP2K4’, ‘MGC4796’, ‘PIK3R1’, ‘PRKAR1A’, ‘SSTK’, ‘STK11’, ‘STK22D’ . I wonder how would you differentiate between them.

ADD REPLY • link 5.8 years ago by Chirag Parsania ★ 2.0k