Question

Comparing 3 Data Sets using DeSeq and Heatmaps

0

Entering edit mode

5 months ago

harter-ainsworth.izaak • 0

Hi all,

I am new to bioinformatics analysis, so I'd appreciate if someone could check my code for the goal I am trying to achieve.

I have 3 samples -

Wild Type (WT)
FoxP3-TCF-HEB (I have 3 replicates of this)
TCFKO

I have defined these in the sample information csv and loaded into a colData variable enter image description here

I want to merge (average) the three FoxP3 replicate FeatureCounts outputs for visualization in the heatmap.

This is what my FeatureCounts output looks like:

enter image description here

I have successfully got the DeSeq differential expression values for the 3 samples by using the contrast function to compare each of the experimental groups against the wild type based on the gene.

But where I think I may have gone wrong is in the construction of the heatmap.

Please could someone check the R Script below to see if they would do it in a similar way.

#Script for DeSeq Analysis 
library(DESeq2)
library(tidyverse)
library(dplyr)
library(pheatmap)
library(RColorBrewer)
library(ggplot2)
library(ggrepel)

#Load Count Data
counts_data <- read.csv('/Users/m300189/Downloads/countsText_3Samples.csv', header=TRUE, row.names = 1)

head(counts_data)

#Load Sample Info
colData <- read.csv ('/Users/m300189/Downloads/Data/SampleInfo.csv', header = TRUE, row.names = 1)


#Do the row names in ColData match the columns in count_data
all(colnames(counts_data) %in% rownames(colData))

#Are they in the same order 

all(colnames(counts_data) == rownames(colData))

#construct DeSeqDataSet 

dds <- DESeqDataSetFromMatrix(countData = counts_data,
                       colData = colData,
                       design = ~ Gene )

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]


#Run DeSeq and Contrast analysis/volcano plot 

dds <- DESeq(dds)

res1 <- results(dds, contrast = c("Gene", "WT", "FoxP3-TCF-HEB"))
res2 <- results(dds, contrast = c("Gene", "WT", "TCFKO"))

summary(res1)
summary (res2)

plotMA(res1)
plotMA(res2)


normalized_counts <- counts(dds, normalized = TRUE)

#Generating a heatmap 
#1. Merging the FoxP3 samples 

# Identify the columns for FoxP3 samples
foxp3_cols <- grep("FoxP3", colnames(normalized_counts))

# Calculate the mean expression for FoxP3 samples
foxp3_mean <- rowMeans(normalized_counts[, foxp3_cols])

# Construct a new data frame for heatmap
heatmap_data <- data.frame(
  B6_WT = normalized_counts[, "Jasmin_WT"], 
  FoxP3_TCF_HEB = foxp3_mean,
  TCFKO = normalized_counts[, "Osman_TCFKO"]
)

# Select top differentially expressed genes from both res1 and res2 for the heatmap

# Use normalized counts of combined top genes for heatmap
combined_top_genes <- unique(c(top_genes_res1, top_genes_res2))
heatmap_data_selected <- heatmap_data[combined_top_genes,]

# Generating heatmap
pheatmap(heatmap_data_selected,
         scale = "row",
         clustering_distance_rows = "euclidean",
         clustering_distance_cols = "euclidean",
         clustering_method = "complete",
         color = colorRampPalette(brewer.pal(9, "Blues"))(255))

Thank you in advance for your help, and any suggestions on how I can improve would be greatly appreciated.

DESeq2 Normalization • 459 views

ADD COMMENT • link updated 4 months ago by Ram 43k • written 5 months ago by harter-ainsworth.izaak • 0

score 1 · Answer 1 · 2023-11-28

1

Entering edit mode

5 months ago

sure ▴ 100

Am I correct to understand that you have 3 conditions and have an imbalanced design (not the same number of replicates across all conditions)? My personal preference, If I have an imbalanced design or no biological replicate I try to use NOIseq in my analysis. https://www.bioconductor.org/packages/release/bioc/vignettes/NOISeq/inst/doc/NOISeq.pdf. There may be other options out there but I have found NOISeq suitable for my analysis.

ADD COMMENT • link 5 months ago by sure ▴ 100

0

Entering edit mode

Hi, Thanks for your reply. For this test analysis I was running an imbalanced design yes - not ideal I know.

But I am currently downloading all the data so I will have 3 replicates for each group.

I will have a look into NOISeq.

Does this code look OK for what I am trying to - see if there is any different between my 3 condition? I will add the other 4 datasets in due course.

ADD REPLY • link 5 months ago by harter-ainsworth.izaak • 0