Two questions about correlation calculation and plotting in R
0
0
Entering edit mode
6 weeks ago

Hello! I need to calculate and plot some data in R, but I am having some trouble doing this so any help will be really appreciated. I have two tables in R which looks like this:

Tumor_Sample_Barcode ERBB3 ERBB4 ERCC2 ESR1
10010212      0     0     0     0
10010215      0     0     1     0
10010219      0     0     0     0
10010223      0     0     1     0
10010228      0     0     0     0
10010238      0     0     0     0
10010244      0     0     0     0
10010249      0     0     0     0


One has information about somatic variants found in different tumour samples while the other has information about the germline variants (both have the same "Tumor Sample ID"). A number 1 means that a variant has been found for that gene while a 0 means that there hasn't.

Firstly, I had to calculate the correlation between them (Germline x germline and somatic x somatic) and had no problem with that, but now I need to calculate the correlation between germline and somatic variant and I had no idea how to do that.

Secondly, I noticed that there are a lot of genes that have 0 correlation with any other gene (except themselves) and I would like to delete them from the table, in order to make the data cleaner, how can I achieve this?

This is the piece of code I am using to calculate and plot my correlations:

g_spearmancorr <- cor(genes_gvcount, use = "complete.obs", method = "spearman")
g_spearmancorr[is.na(g_spearmancorr)] = 0
pheatmap(g_spearmancorr, cluster_rows = F, cluster_cols = F,
fontsize_col = 5, fontsize_row = 3)


So sorry if my questions is too basic but I am pretty novice using R and programming in general. Thanks in advance!

correlation rstudio spearman pheatmap • 168 views