Entering edit mode
18 months ago
Dominique
•
0
Good day,
I'm not used to R programming, but I'm trying to remove batch effects from median normalized data that was generated from a protein microarray platform. The data was generated over 3 different days (28-Jul-2022, 30-Jul-2022 and 8-Aug-2022). I'm using ComBat to remove the batch effects.
This is the code:
library(sva)
meta <- read.csv("C:/directory of file/metadata.csv")
modcombat = model.matrix(~1, data=meta)
median_combat <- ComBat(as.matrix(meta), batch = meta$date, mod = modcombat)
Extra details:
str(meta)
'data.frame': 192 obs. of 27 variables:
$ date :chr "28-Jul-22" "28-Jul-22" "28-Jul-22" "28-Jul-22"
$ Protein A : num 57.83 12.16 80.81 27.14 9.79 ...
$ Protein B : num 9.96 402.97 18.19 15.91 18.13 ...
$ Protein C : num 74.6 70.5 110.1 30.6 77.6 ...
$ Protein D : num 24.5 26.4 109.5 90.6 75.3 ...
$ Protein E : num 71.9 73.3 186.9 93.1 100.6 ...
$ Protein F : num 148 208 192 623 184 ...
$ Protein G : num 68.8 167.5 100.7 362.5 101.4 ...
$ Protein H : num 2.11 8.6 882.38 92.81 4.59 ...
$ Protein I : num 0 8.39 13.65 14.51 0 ...
$ Protein J : num 1714.73 3654.55 537.31 675.57 8.24 ...
$ Protein K : num 148 100 206 743 17 ...
$ Protein L : num 824 601 2597 968 582 ...
$ Protein M : num 650 296 999 358 478 ...
$ Protein N : num 616 294 1120 414 459 ...
$ Protein O : num 73.5 84.5 189.4 40.2 106.8 ...
$ Protein P : num 67.1 79.7 187.8 339.4 101.4 ...
$ Protein Q : num 67 26.4 179.7 29 81.9 ...
$ Protein R : num 57.1 25.2 103.1 25.9 40.2 ...
$ Protein S: : num 67.8 25.2 115.5 26.2 46.8 ...
$ Protein T : num 32587 37755 39304 40261 37440 ...
$ Protein U : num 50112 50112 50112 50112 50112 ...
$ Protein V : num 31805 26998 27058 27905 27555 ...
$ Protein W : num 4835 4380 4686 4672 4684 ...
$ Protein X : num 3105 2774 2834 2948 2928 ...
$ Protein Y : num 96.9 96.5 100.7 114.6 106.8 ...
$ Protein Z : num 140.4 87.5 79.9 32.3 42.7 ...
str(modcombat)
num [1:192, 1] 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:192] "1" "2" "3" "4" ...
..$ : chr "(Intercept)"
- attr(*, "assign")= int 0
And I got this error:
Error in dat[, batch == batch_level] : (subscript) logical subscript too long
Could anyone please explain what the problem is and how to fix the error?
Much appreciated
The input for the function must be a numeric matrix with genes as rows and columns as samples. Your code indicates that the object also contains the batch information, so that is incorrect. See the examples at
?ComBat
to learn about the required structure. Basically, just remove Date from the matrix, save it as a separate vector and provide that to the function. The design matrix would typically contain the covariates to preserve, so the experimental design. With~1
it does not add any information. From the magnitude of your numeric values, what are these data? Is this log2 scale?There are no genes. I used a protein microarray. I checked the sera IgG and IgA antibody signals against each protein. The data is in relative fluorescence units that was normalized using the median normalization method not log2 scale.
Genes, probes, proteins...observations :)
I think sva expects somewhat log2-scaled values, or at least such that are not as heteroscedastic as the ones you have. It was developed with microarrays in mind and these are typically log2.
Ooooohhhh :0
Thank you very much!!! Much appreciate it!!!