I am trying to adjust my RNA data using ComBat-Seq
(from sva
R package) since I realised that there are 3 batches that I need to adjust for:
- Place (2 levels: place 1, place 2)
- Library Preparation Date (16 levels - different dates)
- Type of tube (2 levels: A, B)
I have 960 samples and around 62000 genes.
In my biological matrix, I have: Age, Sex, Group (cases,controls..) and WBC counts.
biol_mat = model.matrix(~Age + as.factor(Sex) + as.factor(Group) + LYMPH + MONO + NEUT, data=phenotype)
In the tutorial of Combat-Seq
appears how to adjust by 1 variable but it doesn't tell you how to adjust by more than 1.
I have seen a lot of posts using combat
that the only way is to adjust by 1 variable and then, with those results, adjust again by the 2nd variable and so on.
That would be:
### Adjust by library prep.
raw.cts_adjustedLibPrep <- ComBat_seq(counts = raw_cts_matrix, batch=batch_libraryprep, group=NULL, covar_mod = biol_mat)
### Adjust by library prep + type of tube.
raw.cts_adjusted_LibPrep_TypeTube <- ComBat_seq(counts = raw.cts_adjustedLibPrep, batch=batch_type_tube, group=NULL, covar_mod = biol_mat)
### Adjust by library prep + type of tube + place
raw.cts_adjusted_LibPrep_TypeTube_Place <- ComBat_seq(counts = raw.cts_adjusted_LibPrep_TypeTube, batch=batch_place, group=NULL, covar_mod = biol_mat)
For the first adjustment (library prep) it takes around 15min, but for the second... it has been running for more than 2 days.. I stopped it and launch it again, changing the adjustment but I am not sure if it will work..
Does anybody have an idea about how to fix the problem?
Thanks in advance
I do not think that this function has been developed with such large sample size in mind, nor am I sure that sequential adjustment makes sense. With 960 samples are you really going to use GLM approaches rather than just using limma (trend/voom). In any case, for a differential analysis you would add these factors into the design, and for corrected counts you could simply use
removeBatchEffect
from limma, possibly putting back to normal scale by using2^x
on the returned corrected matrix.First of all, thanks very much for your reply. In relation to your questions...
I need to run the adjustment for 1) Running
variancePartition
and see how many and which genes are explained by X variables. 2) Drawing some plots, do some cutoffs and do some downstream analyses.In relation to
variancePartition
there is one section in the vignette named as "4.2 Removing batch effects before fitting model" that it may work... but I have been trying with the example that they use in the tutorial and I lose the genes (now it doesn't appear the genes anymore and I need them).This is the example that I am using:
You should usually get the gene names as your rownames in
varPartResid2
as it gives you in this piece of code:Anyway, what I don't understand is that I can adjust 1 variable, but not for more... And I really need the adjusted counts to do other things, without having the effect of the batches...
I will have to try
removeBatchEffect
but it only allows you to adjust by 2 batches... so, I will have to use sequential adjustment to adjust by 3 variables... Does it work with rnaseq data? Sincelimma
was made for microarray... I am afraid to get wrong results because of this.removeBatchEffect
can take as many batches as you like: https://support.bioconductor.org/p/128357/limma was originally developed for microarrays but has underwent extensive work for RNA-seq / digital counts, so that will not be an issue. See limma voom and trend pipelines and the limma manual. The limma RNA-seq paper has > 4k citations.
By the way, you posted this in at least three communities. Nobody can stop you from doing that but etiquette would require that you at least mention the cross-posts to avoid double efforts.
https://bioinformatics.stackexchange.com/questions/19456/how-to-adjust-by-multiple-variables-using-combat-seq
https://support.bioconductor.org/p/9145644/
Sorry, I didn't know if I would get an answer, I will delete the post from the rest of the communities. Thanks very much and sorry for any inconvenience
Leave it, just add a link to the other communities. Different brains will give different answers. I do not pretend that I am the one genuine expert in anything. Better answers may/will exist.