Entering edit mode
12 months ago
mthm
▴
50
I have a table of 18 columns and I want to test the linear correlation (regression) of one independent and one dependent variable each time. Before that, I have to scale my data but I also want to do that for each independent variable separately because the format of values between columns are not the same ( length, percentage, number,..). so like this:
## select columns
run1 <- subset(file, select = c(coverage, contigs))
run2 <- subset(file, select = c(coverage, genes))
run3 <- subset(file, select = c(coverage, BUSCO))
.
.
## scale and center the data
scale1 <- scale(run1, center = TRUE, scale = TRUE)
scale2 <- scale(run2, center = TRUE, scale = TRUE)
scale3 <- scale(run3, center = TRUE, scale = TRUE)
.
.
## run the correlation test
cov.lm1 <- lm(coverage ~ contigs , data = as.data.frame(scale1))
cov.lm2 <- lm(coverage ~ genes , data = as.data.frame(scale2))
cov.lm3 <- lm(coverage ~ BUSCO , data = as.data.frame(scale3))
.
.
Is there a way to wrap up this process into a loop to make it easier? if yes, can you tell me how?
Not really a bioinformatics related question. Start by looking at R tutorial, there is plenty of them online e.g. : https://www.statmethods.net/r-tutorial/index.html
This post does not fit the theme of this forum. It will be deleted in a few hours.