Question

Differential protein/biomarker expression using limma: is that possible?

2

Entering edit mode

3.4 years ago

Ridha ▴ 130

Hey there! Hope everyone is doing great :)

I have a question regarding using LIMMA package for data that is not RNA-seq nor microarray. I have a dataset of protein/biomarker quantification and I would like to get log-fold changes(i.e. using differential protein expression) based on my conditions of interest. However, the used measurement technique for the dataset I have(Proximity Extension Assay technology) does not provide absolute expression/quantification, but normalized protein expression (NPX). NPX(click here for more details) is an arbitrary unit on Log2 scale. These normalized expressions(their expressions are normally distributed) are highly correlated with absolute quantification of proteins(spearman's correlation can reach up to 0.85 for the same proteins). My understanding is since the data that I have is normalized quantification/expression, which is similar to what is done in microarrays(normalized Microarray intensity values), the data I have can be analyzed using the same pipelines for microarrays. However, in the user guide of limma, I could not find an explanation/mention about whether limma could also be used in such settings/data/applications.

My questions are :

1) is it possible to use limma to find differentially expressed proteins in my case? Also, is it a valid way for such analysis?

2) if yes, should I also set trend=T and robust =T or just use the normal pipeline?

3) if that's not possible, any thoughts or suggestions to do differential protein expression?

Thank you very much in advance for your help!

RNA-Seq NPX LIMMA PROTEOMICS microarray • 6.3k views

ADD COMMENT • link updated 3.1 years ago by tesic93 ▴ 40 • written 3.4 years ago by Ridha ▴ 130

1

Entering edit mode

Hi, I suggest you post this over at support.bioconductor.org. The limma authors are monitoring the support page and are very responsive, this you get you the definite expert's answer you are looking for towards limma and your data.

ADD REPLY • link 3.4 years ago by ATpoint 84k

0

Entering edit mode

Thank you very much for the suggestion. For possibly similar questions in the future, I have posted the question there and got an answer from professor Gordon Smyth. https://support.bioconductor.org/p/9135581/#9135807

ADD REPLY • link 3.3 years ago by Ridha ▴ 130

0

Entering edit mode

see if this link serves your purpose: https://uclouvain-cbio.github.io/BSS2019/figs/cancer_3x3.html

ADD REPLY • link 3.4 years ago by cpad0112 21k

0

Entering edit mode

Dear Cpad, thanks for your suggestion. I read about the MSqRob package and it seems only used to proteomics measured using mass spectrometry. In my case, proteins were not measured using the same technique.

ADD REPLY • link 3.4 years ago by Ridha ▴ 130

score 3 · Answer 1 · 2021-06-05

As you've probably seen, there was a recent paper: Proteomic blood profiling in mild, severe and critical COVID-19 patients with published analysis as a supplementary file here. In the analysis they use the robust method, but not by setting robust=T in eBayes, but by setting method="robust" in lmFit. From what I've gathered online, the difference between those two is best described in this post by Gordon Smyth. On my data, robust=T in eBayes provides very much similar results to robust=F, but method="robust" in lmFit returns much more statistically significantly deferentially expressed proteins. However, here is an email correspondence Gordon has published in which he says he can't say how reliable p-values from method="robust" in lmFit are and instead recommends using the robust=T in eBayes to robustify.

I know it's been a while since you posted the question, so I hope that you no longer need this answer. If you've finished work on this, what did you land on in the end in regards to robustifying?

BTW, looking at the density plot you shared in the bioconductor thread I see that you didn't perform quantile normalization. Or was the plot from before the (presumably) normalizeBetweenArrays function was used? I'm asking because I usually use it, so I'm interested to know if there are reasons to skip it.

score 0 · Answer 2 · 2021-03-15

0

Entering edit mode

3.4 years ago

halo22 ▴ 300

Depending on your outcome you can use linear(continuous) or logistic regression(categorical). LIMMA should also work but if you have problems running it do what I mentioned. RNA-seq specific models are designed to address the predictor (gene) distribution.

ADD COMMENT • link 3.4 years ago by halo22 ▴ 300

0

Entering edit mode

Dear halo, thanks for your reply and suggestions. I don't think traditional regression methods are valid when you have a large number of multicoliner proteins. Do you perhaps have resources where a tutorial using limma is provided or at least has been done in cases similar to mine? Thanks again for your help!

ADD REPLY • link 3.4 years ago by Ridha ▴ 130

2

Entering edit mode

If you are interested in running a multivariable analysis and if you are concerned about multicollinearity, I'd suggest using something like randomforest or lasso regression. Limma will help with univariate analysis where each protein will be modeled separately. I work on proteomics data generated from platforms like OLINK and somalogics and have used linear and logistic regressions successfully.

ADD REPLY • link 3.4 years ago by halo22 ▴ 300