Calculating correlation matrix with multiple observations per observable
0
0
Entering edit mode
17 months ago
bioneer ▴ 30

Hi Biostars,

I have some time course data where observables have been measured multiple times per time point. I want to compute the correlation matrix using R for my downstream analysis, but encountered the following problem, illustrated in the two scenarios below.

Scenario 1 is an example with only one observation per observable and time point, and hence, the correlation matrix can be computed.

Scenario 2 is an example with three observations per observable and time point, the correlation matrix cannot be computed (with my current approach).

library(tidyverse)
library(reshape2)

# Set seed
set.seed(23)


Scenario 1:

# Scenario 1: Two observables, 5 time points, one observation per time point

# Create columns for test data
time    <- rep(c(0,2,5,8,10), 2, each = 1)
observable <- c(rep("Obs1", 5), rep("Obs2", 5))
values  <- runif(10, min=0, max=1)

# Concatenate columns to test data with unique values in time column
test <- data.frame(time, observable, values)

# Transform test data into wide format
test_wide <- pivot_wider(test, names_from = observable, values_from = values) %>%
unnest(c(Obs1, Obs2))

# First column to rownames
test_wide <- test_wide %>% remove_rownames %>% column_to_rownames(var="time")

# Calculate correlation matrix
cormat <- cor(test_wide, method = "spearman")


Scenario 2:

# Scenario 2: Two observables, 5 time points, three observations per time point

# Create columns for test data
time    <- rep(c(0,2,5,8,10), 1, each = 3)
observable <- c(rep("Obs1", 15), rep("Obs2", 15))
values  <- runif(30, min=0, max=1)

# Concatenate columns to test data with non-unique values in time column
test  <- data.frame(time, observable, values)

# Transform test data into wide format
test_wide <- pivot_wider(test, names_from = observable, values_from = values) %>%
unnest(c(Obs1, Obs2))


Problem 1: Values in time column are not unique, hence, the pairing is ambiguous.

Warning message: Values are not uniquely identified; output will contain list-cols.

test_wide <- test_wide %>% remove_rownames %>% column_to_rownames(var="time")


Problem 2: The time column needs to be excluded from the calculation of the correlations, but the information about which observations belong to which time point needs to be preserved. Simply converting the time column to row names does not work because time contains non-unique values. The following error is thrown:

Error in .rowNamesDF<-(x, value = value) : duplicate 'row.names' are not allowed

# Calculate correlation matrix
cormat <- cor(test_wide, method = "spearman")


Can anyone suggest a suitable method to compute the correlation matrix with the kind of data I have? (One solution is to calculate the average per time point, but I would like to avoid that.) Thanks in advance!

multiple R observations correlation matrix • 296 views