Your experimental design lacks biological replicates, which limits statistical power and prevents the use of interaction terms in DESeq2 for detecting patient-specific temporal responses. Established methods for differential expression over time, such as maSigPro or ImpulseDE2, typically require replicates to model variance accurately and compare trajectories between groups. Without replicates, these approaches are not directly applicable to identifying outlier patients or genes with atypical temporal patterns.
For detecting genes where one patient's expression differs at a specific timepoint, consider applying outlier detection methods to normalized counts per timepoint separately. OUTRIDER is suitable for this; it uses an autoencoder to model expected gene expression across samples and flags aberrant expression in individuals within a cohort. Install it via Bioconductor and apply it as follows:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("OUTRIDER")
library(OUTRIDER)
# Assuming 'counts' is your raw count matrix and 'colData' has sample annotations
ods <- OutriderDataSet(countData = counts, colData = colData)
ods <- filterExpression(ods, minCounts = 10) # Filter low-expressed genes
ods <- estimateSizeFactors(ods)
ods <- controlForConfounders(ods) # Correct for batch effects if needed
ods <- fit(ods) # Fit the model
ods <- aberrance(ods) # Detect outliers
results <- results(ods, padjCutoff = 0.05)
This will identify genes with outlier expression per sample (patient-timepoint combination). You can then inspect these for patterns across timepoints manually, associating with metadata.
For overall temporal patterns, an ad-hoc approach is reasonable given the pilot nature. Normalize expression per gene to the pre-treatment timepoint (t0), create vectors of fold-changes (e.g., t1/t0, t2/t0, t3/t0), and compute pairwise Euclidean distances or Pearson correlations between patients' vectors. Genes with high variance in distances (e.g., largest standard deviation) may indicate divergent responses. Select top genes for visualization via heatmaps or line plots, and validate in your larger cohort.
Focusing on genes significant in multiple patient contrasts from your existing DESeq2 model is a valid alternative, but it may miss time-specific deviations.
Kevin
Without replicates or controls, I am not sure any of this data is really that useable. RNAseq is quite variable at the best of times, which is why you almost always use at least technical replicates if not biological replicates. Also, if you are looking to detect markers related to the treatment, why are you looking for genes that are different among patients?
But if you really just want to identify genes that are different among patients at each timepoint, this would be a simple R/Python script to write yourself using transcript counts.
This is a pilot analysis of a project with ~15 patients. Only 5 have been sequenced in this phase. Patients respond very differently to the treatment, with different prognoses and side effects/toxicity. That's why we are interested in finding some candidate genes showing a different expression profile through time in the currently sequenced patients. To later validate them in the larger sample using qPCR or further sequencing.
I have also run DESeq2 with a model using the polynomial of timepoints, to detect parabola-shaped genes. I'm wondering if there is some already established method that can detect the shape of the time-series for each gene x patient, instead of having to do that coarsely to check if a gene goes e.g., Up-Flat-Down, or Down-Flat-Flat.
The typical way to do this would be to group the samples by responsiveness, and compare the groups to each other.
I wish I could do that. However, these 5 samples were done in 2 batches. 2 males / 3 females. 3 tumor reduction / 1 NR / 1 tumor growth. Wildly different mean expression (across all timepoints) per patient of many genes (independent of batch)...
That's why I'm looking for a dirty method to get an idea of the "differences in shape" within each patient, and explore from there.
I could take each patient x gene, normalize the expression at each timepoint by t0, make a vector with c( t1-t0, t2-t1, t3-t1 ), calculate the pairwise distance between all patients. And then... what? Select the genes with the largest SD in their pairwise distances?