time series metabolomic clustering
1
0
Entering edit mode
6.0 years ago

I have a set of patients.

metabolomic measurements were made on the same patients at three time points.

So I have a matrix of patients and metabolite abundancies for each time point.

I want to see how each indiviudal patient's metabolomic profile changed across those three time points and cluster the patients that changed in similar manner.

I imagine this is some kind of time series clustering, does anyone have any suggestion on how to approach this?

clustering time-series metabolomics • 2.4k views
ADD COMMENT
0
Entering edit mode
6.0 years ago

Two standard options:
- use dynamic time warping (see the R package dtw) and apply a clustering algorithm to the resulting distance matrix
- treat the data as a 3D tensor and apply a tensor factorization algorithm (see for example my tutorial here)

ADD COMMENT
0
Entering edit mode

Hi Jean,

Thanks, how would I apply dtw effectively in this instance; would i need to create a matrix of each patient and then the individual metabolite across the three time points? I'm a bit confused.

currently my data looks like this:

dataframe1 is a dataframe of recordings of patient names (rows) and columns are several individual metabolite readings all for the same time point.

dataframe2 is a dataframe of recordings of patient names (columns) and several individual metabolite readings all for the same metabolites and patients but at a future time point x number of days later.

dataframe3 is a dataframe of recordings of patient names (columns) and several individual metabolite readings all for the same metabolites and patients at a further time point x number of days even later.

ADD REPLY
0
Entering edit mode

Is dataframe1 really transposed compared to df2 and 3, i.e. patients in rows vs patients in columns ?
Anyway, for the dtw() function, you need the time points in rows and metabolites in columns for each patient so you need to transform your data with something like the untested code below (assuming all data frames have patients in rows and the same metabolites in columns and each data frame represents a different time point):

library(abind)
# Creates a 3d array of patients x metabolites x time points
Ary <- abind(dataframe1, dataframe2, dataframe3, along = 3)
# Reshape to time points x metabolites x patients
Ary <- aperm(Ary, c(3,2,1))
# Compute dynamic time warping between patients and populate distance matrix
Npatients <- dim(Ary)[3]
D <- matrix(NA, nrow = Npatients, ncol = Npatients)
for(i in 1:Npatients) {
  for(j in 1:Npatients) {
    aln <- dtw(Ary[,,i], Ary[,,j], dist.method = "cosine")
    D[i,j] <- aln$distance
  }
}
ADD REPLY
0
Entering edit mode

Hi Jean, thank you very much for this suggestion, sorry no that was just a type it isn't transposed haha.

I successfully made the 3d array of patients x metabolites x time points

however I am struggling to conceptually understand what dtw is doing, is it not taking an average of all the metabolites and then calculating distance between patients

I want to try and create clusters of patients which shared changes in specific metabolites, not that they had an overall upregulation or downregulation over time.

ADD REPLY
0
Entering edit mode

You can think of dtw as sequence alignment for multivariate data. Instead of aligning sequences of characters, it aligns sequences of multidimensional points. Just like sequence alignment, it returns a score (a distance) telling you how similar the two time series are. So you can compute all pairwise distances between patients and apply a clustering algorithm to the resulting distance matrix. This will cluster the patients by how similar their profiles are. However, in this case, I would also explore the data using a tensor factorization approach as this could identify clusters of patients that share changes in clusters of metabolite.

ADD REPLY
0
Entering edit mode

Can I use the same array of time points x metabolites x patients constructed in the previous step for the dtw?

I tried running it but got an error of :

Error in if (sum(tnsr@data == 0) == prod(tnsr@modes)) return(TRUE) : missing value where TRUE/FALSE needed

Is this because I have some NAs in my data?

ADD REPLY
0
Entering edit mode

I got the same error without NAs.

ADD REPLY
0
Entering edit mode

For the tensor factorization, you start from the same 3d array you constructed following my indications above but you need to turn it into a tensor object before using it with the rTensor package, i.e.:

Ary <- as.tensor(Ary)

Not doing this is probably what causes the error. You must also replace the NAs in the array with some numerical values. If there are not too many NAs, replacing them with anything sensible should not affect the outcome. Also the factor matrices could be used to predict the values of the NAs.

ADD REPLY
0
Entering edit mode

Hi again, this was after I had run the following code, so I believe I had already turned it into a tensor object, and this is with a complete dataset with no NAs:

G<-as.tensor(Ary) #this works cpG <- cp(G,num_components = 4) #this gives the error

Error in if (sum(tnsr@data == 0) == prod(tnsr@modes)) return(TRUE) : missing value where TRUE/FALSE needed

ADD REPLY
0
Entering edit mode

The error means that the test returns NA. Check each part separately. If sum(...) returns NA, you probably have some NAs left in the data. If prod(...) returns NA then there may be something wrong with the tensor object.

ADD REPLY
0
Entering edit mode

Wow, you were totally right! I debugged my NA-processing and now it works. I have successfully ran the R script and have the cpG object. What is the best way to explore this in terms of plots etc? I will have a read and try and see what is best but I am open to suggestions! Much appreciated!

ADD REPLY

Login before adding your answer.

Traffic: 2519 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6