Finding differences between individual patients in a time-series RNA-seq?
1
0
Entering edit mode
3 days ago
txema.heredia ▴ 280

Hi,

I am analyzing the RNA-seq of a pilot project trying to detect markers in the blood of 5 treated patients.

The data I have is:

  • 5 patients (no controls)
  • 4 timepoints (1 pre-treatment + 3 post-treatment).
  • 1 replicate per patient ---> 20 samples total.

I have already analyzed the differences between timepoints using DESeq2 (design = ~timepoint + patient). Now we are interested in finding individual patients/genes whose (temporal) response is different from the other patients. As we only have 1 sample for each patient x timepoint, it is not possible to run interaction models in DESeq2.

I was wondering if there is any method suited to detect genes where one patient's expression (either at one specific timepoint or the overall temporal pattern) is different from the others. To later (manually) find association with the patient's metadata.

I know the statistical power of such analysis would be crap, but... is there any method that can help with this, or should I make up something ad-hoc? Or should I just focus on genes significant on 2/3+ contrasts in the patient factor of my DESeq2 model?

Thanks

RNAseq DESeq2 DGE • 443 views
ADD COMMENT
0
Entering edit mode

Without replicates or controls, I am not sure any of this data is really that useable. RNAseq is quite variable at the best of times, which is why you almost always use at least technical replicates if not biological replicates. Also, if you are looking to detect markers related to the treatment, why are you looking for genes that are different among patients?

But if you really just want to identify genes that are different among patients at each timepoint, this would be a simple R/Python script to write yourself using transcript counts.

ADD REPLY
0
Entering edit mode

This is a pilot analysis of a project with ~15 patients. Only 5 have been sequenced in this phase. Patients respond very differently to the treatment, with different prognoses and side effects/toxicity. That's why we are interested in finding some candidate genes showing a different expression profile through time in the currently sequenced patients. To later validate them in the larger sample using qPCR or further sequencing.

I have also run DESeq2 with a model using the polynomial of timepoints, to detect parabola-shaped genes. I'm wondering if there is some already established method that can detect the shape of the time-series for each gene x patient, instead of having to do that coarsely to check if a gene goes e.g., Up-Flat-Down, or Down-Flat-Flat.

ADD REPLY
0
Entering edit mode

The typical way to do this would be to group the samples by responsiveness, and compare the groups to each other.

ADD REPLY
0
Entering edit mode

I wish I could do that. However, these 5 samples were done in 2 batches. 2 males / 3 females. 3 tumor reduction / 1 NR / 1 tumor growth. Wildly different mean expression (across all timepoints) per patient of many genes (independent of batch)...

That's why I'm looking for a dirty method to get an idea of the "differences in shape" within each patient, and explore from there.

I could take each patient x gene, normalize the expression at each timepoint by t0, make a vector with c( t1-t0, t2-t1, t3-t1 ), calculate the pairwise distance between all patients. And then... what? Select the genes with the largest SD in their pairwise distances?

ADD REPLY
0
Entering edit mode
2 days ago
Kevin Blighe ★ 90k

Your experimental design lacks biological replicates, which limits statistical power and prevents the use of interaction terms in DESeq2 for detecting patient-specific temporal responses. Established methods for differential expression over time, such as maSigPro or ImpulseDE2, typically require replicates to model variance accurately and compare trajectories between groups. Without replicates, these approaches are not directly applicable to identifying outlier patients or genes with atypical temporal patterns.

For detecting genes where one patient's expression differs at a specific timepoint, consider applying outlier detection methods to normalized counts per timepoint separately. OUTRIDER is suitable for this; it uses an autoencoder to model expected gene expression across samples and flags aberrant expression in individuals within a cohort. Install it via Bioconductor and apply it as follows:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("OUTRIDER")

library(OUTRIDER)

# Assuming 'counts' is your raw count matrix and 'colData' has sample annotations
ods <- OutriderDataSet(countData = counts, colData = colData)
ods <- filterExpression(ods, minCounts = 10)  # Filter low-expressed genes
ods <- estimateSizeFactors(ods)
ods <- controlForConfounders(ods)  # Correct for batch effects if needed
ods <- fit(ods)  # Fit the model
ods <- aberrance(ods)  # Detect outliers
results <- results(ods, padjCutoff = 0.05)

This will identify genes with outlier expression per sample (patient-timepoint combination). You can then inspect these for patterns across timepoints manually, associating with metadata.

For overall temporal patterns, an ad-hoc approach is reasonable given the pilot nature. Normalize expression per gene to the pre-treatment timepoint (t0), create vectors of fold-changes (e.g., t1/t0, t2/t0, t3/t0), and compute pairwise Euclidean distances or Pearson correlations between patients' vectors. Genes with high variance in distances (e.g., largest standard deviation) may indicate divergent responses. Select top genes for visualization via heatmaps or line plots, and validate in your larger cohort.

Focusing on genes significant in multiple patient contrasts from your existing DESeq2 model is a valid alternative, but it may miss time-specific deviations.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 3203 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6