Comparing RNAseq to proteomics data, coding help?
1
0
Entering edit mode
4.7 years ago
sako242 ▴ 20

Hi all,

I'm looking to use some publicly available data from samples that have had proteomics and bulk RNAseq done on them. However, I'm having issues resolving the actual correlation and assigning the metadata.

For example, this is my workflow for the RNA file:

#Load data as csv. First column is gene ID with gene names going down, every other column
#has the sample name with normalized expression values. Remove duplicates in gene name (ID)
geneExpf="file.csv"
rna=read.table(geneExpf, as.is=TRUE, header=TRUE, sep=',', check.names=FALSE)
rna = distinct(rna, ID, .keep_all = TRUE)

#Re-label data as matrix and make expression values numeric
rna_matrix=as.matrix(as.numeric(rna[,2:ncol(rna)]))

#Make new matrix of rna genes present in protein set and remove duplicates
rna_common = as.matrix(rna[rna$ID %in% protein$ID,])

#Make correlation table for downstream use (corrplot, etc) and ignore NA values
cormatrix <- cor(rna, protein, use = "pairwise.complete.obs")

However, cor returns back an error that reads:

 'x' must be numeric

Even though I tried to make the expression values numeric. When I typeof rna[2,5] which is an arbitrary gene expression value for a sample it reads as a character and sometimes a double depending on the file, though I can't figure out why.

Does anyone have any suggestions for how to fix this issue? I'm spent a long time looking on stackexchange and biostars with people having the same issue, but few of them have been reading from csv files which I think might be my mistake.

Thanks for everyone's help. Greatly appreciated.

RNA-Seq proteomics correlation r • 1.2k views
ADD COMMENT
0
Entering edit mode

Can you show the result of str(rna) and str(rna_matrix)? You probably have some non-numeric entries in either one of the columns, e.g. "NA" or something else that will be read in as a string rather than a number.

ADD REPLY
0
Entering edit mode

str(rna) reads:

'data.frame':   23201 obs. of  262 variables:
 $ ID    : chr  "ENSG00000210049" "ENSG00000211459" "ENSG00000210082" "ENSG00000198888" ...
 $ 69984 : num  -0.872 7.594 9.455 7.537 7.317 ...
 $ 57064 : num  -1.17 7.78 9.24 7.1 7.31 ...

whereas str(rna_matrix) is actually no longer working for me-- it's saying:

Error in as.matrix(as.numeric(rna[, 2:ncol(rna)])) : 
(list) object cannot be coerced to type 'double'
ADD REPLY
1
Entering edit mode

just for completeness sake, try to understand what's happening:

> df
  name val1 val2
1    A    1    1
2    B    2    2
3    C    3    3
> as.numeric(df[, 2:ncol(df)])
Error: (list) object cannot be coerced to type 'double'
> as.numeric(as.matrix(df[, 2:ncol(df)]))
[1] 1 2 3 1 2 3
> as.matrix(df[, 2:ncol(df)])
     val1 val2
[1,]    1    1
[2,]    2    2
[3,]    3    3

EDIT: That being said, since str(rna) already indicated that all the relevant columns were already numeric, you wouldn't even need to enforce it via as.numeric().

ADD REPLY
0
Entering edit mode
4.7 years ago
sako242 ▴ 20

So, I ended up fixing the issue. I turned every csv into dataframes which ended up fixing everything. That changes the commands used to remove duplicates, find intersections, etc. But that fixes the error coming from cor!

ADD COMMENT

Login before adding your answer.

Traffic: 1968 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6