Entering edit mode
3 months ago
Megan
▴
60
Hi,
I am doing a timecourse study looking at the differences between sexs and time. I had noticed from my PCA plot that there was a lot of large group variation within my biological replicates.
It looks to me that some of the points are clustering together. I performed a batch effect to account for unknown sources of variation and created a new PCA plot to examine the clustering within my samples. However, I am not sure whether I overcorrected for any possible unknown variation within my data.
My code is attached below.
```{r setup, include=FALSE}
Brown_Fat <- gene_fpkm %>%
select(gene_id,
MBF2_2, MBF2_3, MBF6_1, MBF6_2, MBF6_3, MBF10_1, MBF10_2, MBF10_3, MBF14_1, MBF14_2,
MBF14_3, MBF18_1, MBF18_2, MBF18_3, MBF22_1, MBF22_2, MBF22_3, FBF2_1, FBF2_2, FBF2_3, FBF6_1,
FBF6_2, FBF6_3, FBF10_1, FBF10_2, FBF10_3, FBF14_1, FBF14_2, FBF14_3, FBF18_1, FBF18_2,
FBF18_3, FBF22_1, FBF22_2, FBF22_3) %>%
column_to_rownames(var = 'gene_id')
keep <- rowMeans(Brown_Fat) > 0.5 # e.g., 1 or 0.5
BrownFat_filtered <- Brown_Fat[keep, ]
BrownFat_filtered_log2RPKM <- log2(BrownFat_filtered + 1)
samples <- c(
"MBF2_2", "MBF2_3", "MBF6_1", "MBF6_2", "MBF6_3", "MBF10_1", "MBF10_2", "MBF10_3",
"MBF14_1", "MBF14_2", "MBF14_3", "MBF18_1", "MBF18_2", "MBF18_3", "MBF22_1", "MBF22_2", "MBF22_3",
"FBF2_1", "FBF2_2", "FBF2_3", "FBF6_1", "FBF6_2", "FBF6_3", "FBF10_1", "FBF10_2", "FBF10_3",
"FBF14_1", "FBF14_2", "FBF14_3", "FBF18_1", "FBF18_2", "FBF18_3", "FBF22_1", "FBF22_2", "FBF22_3"
)
metadata <- data.frame(
Sample = samples,
Sex = ifelse(substr(samples, 1, 1) == "M", "Male", "Female"),
Time = as.integer(gsub("^.{3}(\\d+)_.*$", "\\1", samples)), # Extract digits after first 3 characters
stringsAsFactors = FALSE
)
metadata_v2 <- data.frame(
Sample = samples,
Sex = ifelse(substr(samples, 1, 1) == "M", "Male", "Female"),
Time = as.integer(gsub("^.{3}(\\d+)_.*$", "\\1", samples)), # Extract digits after first 3 characters
stringsAsFactors = FALSE
)
metadata_v2 <- metadata_v2 %>%
column_to_rownames(var = 'Sample')
metadata_v2$Time <- as.factor(metadata_v2$Time)
PCA_Uncorrected <- t(BrownFat_filtered_log2RPKM)
pca_res_uncorrected <- prcomp(PCA_Uncorrected, scale. = TRUE)
autoplot(pca_res_uncorrected, data = metadata, color = "Time", shape = 'Sex' )
cor_matrix <- cor(BrownFat_filtered_log2RPKM, method = "pearson") # or "spearman"
# Visualize with pheatmap
pheatmap(cor_matrix,
main = "Sample-to-Sample Correlation",
annotation_col = metadata_v2,
annotation_row = metadata_v2)
```
```{r setup, include=FALSE}
rownames(metadata) <- colnames(Brown_Fat)
metadata$Time <- as.factor(metadata$Time)
metadata$Replicate <- as.factor(metadata$Replicate)
form <- ~ (1 | Sex ) + (1 | Time) + (1 | Replicate) + (1 | Sex:Time)
varPart <- fitExtractVarPartModel(BrownFat_filtered, form, metadata)
vp <- sortCols(varPart)
library("limma")
plotVarPart(vp)
expression_data <- as.matrix(log2(BrownFat_filtered + 1))
mod <- model.matrix(~ Sex + as.factor(Time), data = metadata)
mod0 <- model.matrix(~1, data = metadata)
n.sv <- num.sv(expression_data,mod, method = c("be"), B = 20)
svobj = sva(expression_data,mod,mod0,n.sv= 3)
expression_corrected <- removeBatchEffect(expression_data, covariates = svobj$sv, design = mod)
dat_t <- t(expression_corrected)
pca_res <- prcomp(dat_t, scale. = TRUE)
autoplot(pca_res, data = metadata, color = "Time", shape = 'Sex' )
cor_matrix <- cor(expression_corrected, method = "pearson") # or "spearman"
# Visualize with pheatmap
pheatmap(cor_matrix,
main = "Sample-to-Sample Correlation",
annotation_col = metadata_v2,
annotation_row = metadata_v2)
```
It would help if you provided your code so others know what steps you took to batch-correct your data
Thank you so much for the suggestion. I have included my code in the post.