I have a protein expression data frame with a metadata data frame which includes age and sex:

nph_csf_metadata =

age sex bam tau

70 f 5 2

75 m 6 1

72 m 4 1

71 f 4 2

I want to keep the bam and tau load but remove the effects of age and sex from my protein expression data using removeBatchEffect.

I have tried this:

log_norm_prot <- log10(norm_prot) log_norm_prot <- t(log_norm_prot)

design = model.matrix(~bam + tau, nph_csf_metadata)

reg_log_norm_prot = removeBatchEffect(log_norm_prot, covariates = nph_csf_metadata$age, design = design)

This helps me regress out age but how can I regress both out at the same time using removeBatchEffect. Appreciate any help.

Thank you for your help, I used covariates but was only able to regress out one variable. Will try the method you advised and also check covariates again.

thank you, this worked. Appreciate your help so much.

Hello,

I am dealing with a bit similar question, using limma to look for DEG between two genotypes (I need to adjust for sex and rin). I tried two methods and was hoping the results from the two methods should be the same. However, it turned out to be different.

My first method is firstly regress out the effects of sex and rin and then do lmFit using only genotype design.

The second method is to put all the factors in lmFit and extract the stats for genotype factor.

I am wondering why the results from the two methods are different (logFC are the same, but p value are different). Many thanks.

The batch correction should be done as part of the linear model, not prior to the linear model. So the second method is right and the first is wrong. There have been many posts about this over the years, especially in the Bioconductor support site.

The help page ?removeBatchEffects says that it should not be used as input to lmFit. removeBatchEffect is only for plotting and exploration purposes, not for linear modelling and differential expression.

Thank you so much. I used the second method for DEG identification. But I would also like to get the expression data after removing the effects of sex and rin, because I want to plot heatmaps to show the expression of some selected genes expression affected only by genotype.

Can I ask how to get the expression data that are only affected by genotype? Many thanks.

Yes, that is what removeBatchEffect is for (as I think I said in my previous comment), for when you want to make a plot with the batch or covariate effects removed.

Yes I just learned the removeBatchEffect function from ur previous comments 13 months ago. And i wanted to confirm that my understanding was correct, so I compared the results from the two methods and thought if the results are the same then my understanding is correct. But now I learned that this way of confirming is wrong. thanks a lot for your enlightenments.