Unfortunately, RPKM data is not ideal for the purposes of cross-sample differential expression analysis. Please read THIS. In their key points:

The Total Count and RPKM normalization methods, both of which are
still widely in use, are ineffective and should be definitively
abandoned in the context of differential analysis.

Logging (base 2) RPKM data may make it's distribution look more even, but the underlying issue still persists. I suggest that you go back to obtain the raw counts and re-normalise by TMM (edgeR) or using DESeq2's method.

## ----------------------------------------------------------

Nevertheless, in order to 'adjust' for confounding factors / covariates, you should probably first check whether these factors are actually influential (statistically) on schizophrenia by testing each independently:

```
MyData
Schizophrenia Gene1 ... Age Sex ...
1 Y 2.33 ... 45 M ...
2 Y 3.21 ... 43 M ...
3 N 1.21 ... 26 F ...
4 N 2.11 ... 35 T ...
...
```

Check encoding:

```
MyData$Schizophrenia <- factor(MyData$Schizophrenia, levels=c("N","Y"))
MyData$Age <- as.numeric(MyData$Age)
et cetera
```

Check each in a logistic regression model:

```
summary(glm(Schizophrenia ~ Age, data=MyData))
summary(glm(Schizophrenia ~ Sex, data=MyData))
summary(glm(Schizophrenia ~ Race, data=MyData))
summary(glm(Schizophrenia ~ SmokingStatus, data=MyData))
summary(glm(Schizophrenia ~ PostmortemInterval, data=MyData))
summary(glm(Schizophrenia ~ SamplepH, data=MyData))
summary(glm(Schizophrenia ~ RIN, data=MyData))
```

If either of these are not statistically significant, you may consider leaving them out. You then test each gene independently and include the statistically significant factors in the model formula:

For example, if `Age`

, `Sex`

, `SmokingStatus`

, and `PostmortemInterval`

model <- glm(Schizophrenia ~ Gene1 + Age + Sex + SmokingStatus + PostmortemInterval, data=MyData)
summary(model)

```
model <- glm(Schizophrenia ~ Gene2 + Age + Sex + SmokingStatus + PostmortemInterval, data=MyData)
summary(model)
et cetera
```

If you need to set this up as a loop over all genes:

```
genelist <- c("Gene1", "Gene2", ..., "GeneX")
for (i in 1:length(genelist)) {
formula <- as.formula(paste("Schizophrenia ~ ", genelist[i], " + Age + Sex + SmokingStatus + PostmortemInterval", sep=""))
model <- glm(formula, data = MyData)
print(summary(model))
}
```

This code could easily be done via `lapply`

, and/or parallelised via `mclapply`

or `foreach`

Kevin