Question: Comparing RNAseq to proteomics data, coding help?
0
gravatar for sako242
7 months ago by
sako24210
sako24210 wrote:

Hi all,

I'm looking to use some publicly available data from samples that have had proteomics and bulk RNAseq done on them. However, I'm having issues resolving the actual correlation and assigning the metadata.

For example, this is my workflow for the RNA file:

#Load data as csv. First column is gene ID with gene names going down, every other column
#has the sample name with normalized expression values. Remove duplicates in gene name (ID)
geneExpf="file.csv"
rna=read.table(geneExpf, as.is=TRUE, header=TRUE, sep=',', check.names=FALSE)
rna = distinct(rna, ID, .keep_all = TRUE)

#Re-label data as matrix and make expression values numeric
rna_matrix=as.matrix(as.numeric(rna[,2:ncol(rna)]))

#Make new matrix of rna genes present in protein set and remove duplicates
rna_common = as.matrix(rna[rna$ID %in% protein$ID,])

#Make correlation table for downstream use (corrplot, etc) and ignore NA values
cormatrix <- cor(rna, protein, use = "pairwise.complete.obs")

However, cor returns back an error that reads:

 'x' must be numeric

Even though I tried to make the expression values numeric. When I typeof rna[2,5] which is an arbitrary gene expression value for a sample it reads as a character and sometimes a double depending on the file, though I can't figure out why.

Does anyone have any suggestions for how to fix this issue? I'm spent a long time looking on stackexchange and biostars with people having the same issue, but few of them have been reading from csv files which I think might be my mistake.

Thanks for everyone's help. Greatly appreciated.

ADD COMMENTlink modified 7 months ago • written 7 months ago by sako24210

Can you show the result of str(rna) and str(rna_matrix)? You probably have some non-numeric entries in either one of the columns, e.g. "NA" or something else that will be read in as a string rather than a number.

ADD REPLYlink written 7 months ago by Friederike5.3k

str(rna) reads:

'data.frame':   23201 obs. of  262 variables:
 $ ID    : chr  "ENSG00000210049" "ENSG00000211459" "ENSG00000210082" "ENSG00000198888" ...
 $ 69984 : num  -0.872 7.594 9.455 7.537 7.317 ...
 $ 57064 : num  -1.17 7.78 9.24 7.1 7.31 ...

whereas str(rna_matrix) is actually no longer working for me-- it's saying:

Error in as.matrix(as.numeric(rna[, 2:ncol(rna)])) : 
(list) object cannot be coerced to type 'double'
ADD REPLYlink modified 7 months ago • written 7 months ago by sako24210

just for completeness sake, try to understand what's happening:

> df
  name val1 val2
1    A    1    1
2    B    2    2
3    C    3    3
> as.numeric(df[, 2:ncol(df)])
Error: (list) object cannot be coerced to type 'double'
> as.numeric(as.matrix(df[, 2:ncol(df)]))
[1] 1 2 3 1 2 3
> as.matrix(df[, 2:ncol(df)])
     val1 val2
[1,]    1    1
[2,]    2    2
[3,]    3    3

EDIT: That being said, since str(rna) already indicated that all the relevant columns were already numeric, you wouldn't even need to enforce it via as.numeric().

ADD REPLYlink modified 7 months ago • written 7 months ago by Friederike5.3k
0
gravatar for sako242
7 months ago by
sako24210
sako24210 wrote:

So, I ended up fixing the issue. I turned every csv into dataframes which ended up fixing everything. That changes the commands used to remove duplicates, find intersections, etc. But that fixes the error coming from cor!

ADD COMMENTlink written 7 months ago by sako24210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2172 users visited in the last hour