How to regress out age and sex using limma removeBatchEffect
I have a protein expression data frame with a metadata data frame which includes age and sex:

age sex bam tau

70 f 5 2

75 m 6 1

72 m 4 1

71 f 4 2

I want to keep the bam and tau load but remove the effects of age and sex from my protein expression data using removeBatchEffect.

I have tried this:

log_norm_prot <- log10(norm_prot) log_norm_prot <- t(log_norm_prot)

design = model.matrix(~bam + tau, nph_csf_metadata)

reg_log_norm_prot = removeBatchEffect(log_norm_prot, covariates = nph_csf_metadata$age, design = design)

This helps me regress out age but how can I regress both out at the same time using removeBatchEffect.

Appreciate any help.

You just use removeBatchEffect with batch = sex, covariates = age and design = design.

As the help page ?removeBatchEffect explains, the batch argument should be a categorical factor (like sex) and the covariates argument is for numerical covariates (like age).

Since you are already familiar with the covariates argument, I wonder why you didn't do the above already?

Thank you for your help, I used covariates but was only able to regress out one variable. Will try the method you advised and also check covariates again.

thank you, this worked. Appreciate your help so much.

afaik, limma::removeBatchEffect can accept two batches i.e specifying batch = (first batch) and batch2 = (second batch)

so would first batch be batch = nph_csf_metadata$age, and second batch2 = nph_csf_metadata$sex?

I tried that and it computes but I feel like its not correct because it is giving me many values of infinity. Is there a way to use the covariate to regress out age and sex at the same time?

Do you need the corrected assay? cause the removeBatchEffect affects only the assay for the visualization hence you need to extract it from the variance-stabilized assay - otherwise you can use

1. combat seq to remove the effect of the age or sex
2. you can try having your design as: ~age+sex+[the_rest_of_your_model]
3. check if there is something like the log likelihood ratio test for you case where you will use both the null and the full model;
Will check that out. However, since this is proteomic expression, variance stabilized transformation is not the correct way to normalize the data. It was normalized, then log-transformed to correct the skew of the data, and then I just want to remove age and sex as variables to not have an affect on the data.