Unsure whether or not I overcorrected for batch effect in dataset
0
1
Entering edit mode
3 months ago
Megan ▴ 60

Hi,

I am doing a timecourse study looking at the differences between sexs and time. I had noticed from my PCA plot that there was a lot of large group variation within my biological replicates.

enter image description here

It looks to me that some of the points are clustering together. I performed a batch effect to account for unknown sources of variation and created a new PCA plot to examine the clustering within my samples. However, I am not sure whether I overcorrected for any possible unknown variation within my data.

enter image description here

My code is attached below.

```{r setup, include=FALSE}



Brown_Fat <- gene_fpkm %>%
  select(gene_id,
    MBF2_2, MBF2_3, MBF6_1, MBF6_2, MBF6_3, MBF10_1, MBF10_2, MBF10_3, MBF14_1, MBF14_2,
    MBF14_3, MBF18_1, MBF18_2, MBF18_3, MBF22_1, MBF22_2, MBF22_3, FBF2_1, FBF2_2, FBF2_3, FBF6_1,
    FBF6_2, FBF6_3, FBF10_1, FBF10_2, FBF10_3, FBF14_1, FBF14_2, FBF14_3, FBF18_1, FBF18_2,
    FBF18_3, FBF22_1, FBF22_2, FBF22_3) %>%
    column_to_rownames(var = 'gene_id') 

keep <- rowMeans(Brown_Fat) > 0.5 # e.g., 1 or 0.5
BrownFat_filtered <- Brown_Fat[keep, ]
BrownFat_filtered_log2RPKM <- log2(BrownFat_filtered + 1)

samples <- c(
  "MBF2_2", "MBF2_3", "MBF6_1", "MBF6_2", "MBF6_3", "MBF10_1", "MBF10_2", "MBF10_3",
  "MBF14_1", "MBF14_2", "MBF14_3", "MBF18_1", "MBF18_2", "MBF18_3", "MBF22_1", "MBF22_2", "MBF22_3",
  "FBF2_1", "FBF2_2", "FBF2_3", "FBF6_1", "FBF6_2", "FBF6_3", "FBF10_1", "FBF10_2", "FBF10_3",
  "FBF14_1", "FBF14_2", "FBF14_3", "FBF18_1", "FBF18_2", "FBF18_3", "FBF22_1", "FBF22_2", "FBF22_3"
)

metadata <- data.frame(
  Sample = samples,
  Sex = ifelse(substr(samples, 1, 1) == "M", "Male", "Female"),
  Time = as.integer(gsub("^.{3}(\\d+)_.*$", "\\1", samples)),  # Extract digits after first 3 characters
  stringsAsFactors = FALSE
)

metadata_v2 <- data.frame(
  Sample = samples,
  Sex = ifelse(substr(samples, 1, 1) == "M", "Male", "Female"),
  Time = as.integer(gsub("^.{3}(\\d+)_.*$", "\\1", samples)),  # Extract digits after first 3 characters
  stringsAsFactors = FALSE
)


metadata_v2 <- metadata_v2 %>%
  column_to_rownames(var = 'Sample')


metadata_v2$Time <- as.factor(metadata_v2$Time)


PCA_Uncorrected <- t(BrownFat_filtered_log2RPKM)
pca_res_uncorrected <- prcomp(PCA_Uncorrected, scale. = TRUE)

autoplot(pca_res_uncorrected, data = metadata, color = "Time", shape = 'Sex' )

cor_matrix <- cor(BrownFat_filtered_log2RPKM, method = "pearson")  # or "spearman"

# Visualize with pheatmap
pheatmap(cor_matrix,
         main = "Sample-to-Sample Correlation",
         annotation_col = metadata_v2,
          annotation_row = metadata_v2)


```

```{r setup, include=FALSE}



rownames(metadata) <- colnames(Brown_Fat) 

metadata$Time <- as.factor(metadata$Time)
metadata$Replicate <- as.factor(metadata$Replicate)


form <- ~  (1 | Sex ) + (1 | Time) + (1 | Replicate) + (1 | Sex:Time)


varPart <- fitExtractVarPartModel(BrownFat_filtered, form, metadata)

vp <- sortCols(varPart)

library("limma")

plotVarPart(vp)


expression_data <- as.matrix(log2(BrownFat_filtered + 1))

mod <- model.matrix(~ Sex + as.factor(Time), data = metadata)
mod0 <- model.matrix(~1, data = metadata)

n.sv <- num.sv(expression_data,mod, method = c("be"), B = 20)

svobj = sva(expression_data,mod,mod0,n.sv= 3)

expression_corrected <- removeBatchEffect(expression_data, covariates = svobj$sv, design = mod)

dat_t <- t(expression_corrected)
pca_res <- prcomp(dat_t, scale. = TRUE)

autoplot(pca_res, data = metadata, color = "Time", shape = 'Sex' )

cor_matrix <- cor(expression_corrected, method = "pearson")  # or "spearman"

# Visualize with pheatmap
pheatmap(cor_matrix,
         main = "Sample-to-Sample Correlation",
         annotation_col = metadata_v2,
         annotation_row = metadata_v2)

```
Quality Plot PCA Control • 619 views
ADD COMMENT
0
Entering edit mode

It would help if you provided your code so others know what steps you took to batch-correct your data

ADD REPLY
0
Entering edit mode

Thank you so much for the suggestion. I have included my code in the post.

ADD REPLY

Login before adding your answer.

Traffic: 4759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6