I'm looking to use some publicly available data from samples that have had proteomics and bulk RNAseq done on them. However, I'm having issues resolving the actual correlation and assigning the metadata.
For example, this is my workflow for the RNA file:
#Load data as csv. First column is gene ID with gene names going down, every other column #has the sample name with normalized expression values. Remove duplicates in gene name (ID) geneExpf="file.csv" rna=read.table(geneExpf, as.is=TRUE, header=TRUE, sep=',', check.names=FALSE) rna = distinct(rna, ID, .keep_all = TRUE) #Re-label data as matrix and make expression values numeric rna_matrix=as.matrix(as.numeric(rna[,2:ncol(rna)])) #Make new matrix of rna genes present in protein set and remove duplicates rna_common = as.matrix(rna[rna$ID %in% protein$ID,]) #Make correlation table for downstream use (corrplot, etc) and ignore NA values cormatrix <- cor(rna, protein, use = "pairwise.complete.obs")
However, cor returns back an error that reads:
'x' must be numeric
Even though I tried to make the expression values numeric. When I typeof rna[2,5] which is an arbitrary gene expression value for a sample it reads as a character and sometimes a double depending on the file, though I can't figure out why.
Does anyone have any suggestions for how to fix this issue? I'm spent a long time looking on stackexchange and biostars with people having the same issue, but few of them have been reading from csv files which I think might be my mistake.
Thanks for everyone's help. Greatly appreciated.