Mass-spectrometry proteomics
1
0
Entering edit mode
26 days ago

I have mass-spectrometry-based proteome data of 6 control and 3 treated sample. There are random number of valid LFQ intensity per protein in each group. For example for a random protein 2 samples in control group and 1 sample in treated group have valid values. There are sometime more or less. There are cases also that per a specific protein, only one random sample from each group have valid value. And I am looking for differentially expressed proteins between control and treated. I don’t want to loose any of data. Could you please tell me what statistical method should I use for my analysis? How to transform and impute the data?

Proteomics analysis • 775 views
ADD COMMENT
2
Entering edit mode
26 days ago
Gordon Smyth ★ 8.2k

One possibility in the limpa package, described here: https://www.biorxiv.org/content/10.1101/2025.04.28.651125v1

Ideally, you would give peptide-level data to limpa. If all you have are LFQ values, then you could try this:

library(limpa)
y.imputed <- dpcImpute(y)
design <- model.matrix(~Treatment)
fit <- dpcDE(y.imputed, design)
fit <- eBayes(fit, robust=TRUE)
topTable(fit, coef=2)

Here, y is a matrix or EList of normalized LFQ values on the log-scale, including NAs, and Treatment is a factor indicating which samples are treated and which are controls. y.imputed is an EList containing imputed values and also information about how reliable each imputed value is.

This approach will avoid losing any data and will, where possible, infer DE from the missing value pattern. A protein may come as DE even if it is entirely NA in one group, provided the LFQ values are sufficiently high in the other group.

The above code uses a default detection probability curve (DPC), but you should explore that by typing:

dpcfit <- dpc(y)
plotDPC(dpcfit)
ADD COMMENT
1
Entering edit mode
ADD REPLY
0
Entering edit mode

Thank you for your reply. I would like to know if the package can handle batch effects and outlier samples too.

ADD REPLY
1
Entering edit mode

limpa shares the full capabilities of the limma package, which includes adjustment for batch effects and outliers. To adjust for batch effects, include the relevant factors or covariates in the design matrix, as you would do for limma or edgeR. To detect and downweight outlier samples, include the argument sample.weights=TRUE in the call to dpcDE.

ADD REPLY
0
Entering edit mode

Thank you for your answer. I conducted an analysis using limpa and found it quite interesting. As far as I understand, both ON and CN models are constructed per peptide (or per precursor) across multiple samples (please correct me if I’m wrong). Now, I would like to know whether the model is sensitive to group-specific parameters as well. For example, if a missing value belongs to the 'control' group, is the imputation based only on values from control samples, or from both control and treatment samples? Additionally, how would this apply in the context of single-cell proteomics data analysis? I would appreciate it if you could take a look at my code below and let me know whether I’m running it correctly (F023 in my genotype name).

lfq_data[lfq_data == 0] <- NA
lfq_data <- log2(lfq_data)
sample_names <- colnames(lfq_data)
F023_cols <- sample_names[grepl("F023_C|F023_T", sample_names)]
lfq_data_F023 <- lfq_data[, F023_cols]
F023_Treatment <- ifelse(grepl("F023_C", F023_cols), "LP",
                        ifelse(grepl("F023_T", F023_cols), "HP", NA))
F023_Treatment <- factor(F023_Treatment)
matched_indices <- !is.na(F023_Treatment)
lfq_data_F023 <- lfq_data_F023[, matched_indices]
F023_Treatment <- F023_Treatment[matched_indices]
F023_cols <- F023_cols[matched_indices]
if (any(is.na(F023_Treatment))) {
  stop("Unmatched samples remain after filtering!")
}
F023_Treatment <- relevel(F023_Treatment, ref = "HP")
design <- model.matrix(~ F023_Treatment) 
if (ncol(lfq_data_F023) != nrow(design)) {
  stop("Design matrix and data column count do not match!")
}
y <- new("EList", list(E = as.matrix(lfq_data_F023)))
y.imputed <- dpcImpute(y)
fit <- dpcDE(y.imputed, design, sample.weights = TRUE)
fit <- eBayes(fit, robust = TRUE)
results <- topTable(fit, coef = 2, adjust = "fdr", number = Inf)
dpcfit <- dpc(y)
plotDPC(dpcfit)
ADD REPLY
2
Entering edit mode

Your code looks ok (although I can't check the data manipulations that are specific to your own dataset). The DE analysis you have done is based entirely on a CN (complete normal) model. The dpc() function uses an ON (observed normal) model, but you have not yet input the DPC so estimated into your DE analysis. You would do that, if you want, by specifying dpc=dpcfit in the dpcImpute() call.

limpa does not use group information at all at the imputation step, as you can see from the dpcImpute() function and help page. Using group information at this stage would cause "double dipping" and potentially lead to failure to control the FDR rate correctly later on in the DE analysis.

Our published papers and bioRxiv preprint include analyses of single cell proteomics data. See the package documentation.

ADD REPLY

Login before adding your answer.

Traffic: 3241 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6