Comparing 3 Data Sets using DeSeq and Heatmaps
1
0
Entering edit mode
12 weeks ago

Hi all,

I am new to bioinformatics analysis, so I'd appreciate if someone could check my code for the goal I am trying to achieve.

I have 3 samples -

  1. Wild Type (WT)
  2. FoxP3-TCF-HEB (I have 3 replicates of this)
  3. TCFKO

I have defined these in the sample information csv and loaded into a colData variable enter image description here

I want to merge (average) the three FoxP3 replicate FeatureCounts outputs for visualization in the heatmap.

This is what my FeatureCounts output looks like:

enter image description here

I have successfully got the DeSeq differential expression values for the 3 samples by using the contrast function to compare each of the experimental groups against the wild type based on the gene.

But where I think I may have gone wrong is in the construction of the heatmap.

Please could someone check the R Script below to see if they would do it in a similar way.

#Script for DeSeq Analysis 
library(DESeq2)
library(tidyverse)
library(dplyr)
library(pheatmap)
library(RColorBrewer)
library(ggplot2)
library(ggrepel)

#Load Count Data
counts_data <- read.csv('/Users/m300189/Downloads/countsText_3Samples.csv', header=TRUE, row.names = 1)

head(counts_data)

#Load Sample Info
colData <- read.csv ('/Users/m300189/Downloads/Data/SampleInfo.csv', header = TRUE, row.names = 1)


#Do the row names in ColData match the columns in count_data
all(colnames(counts_data) %in% rownames(colData))

#Are they in the same order 

all(colnames(counts_data) == rownames(colData))

#construct DeSeqDataSet 

dds <- DESeqDataSetFromMatrix(countData = counts_data,
                       colData = colData,
                       design = ~ Gene )

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]


#Run DeSeq and Contrast analysis/volcano plot 

dds <- DESeq(dds)

res1 <- results(dds, contrast = c("Gene", "WT", "FoxP3-TCF-HEB"))
res2 <- results(dds, contrast = c("Gene", "WT", "TCFKO"))

summary(res1)
summary (res2)

plotMA(res1)
plotMA(res2)


normalized_counts <- counts(dds, normalized = TRUE)

#Generating a heatmap 
#1. Merging the FoxP3 samples 

# Identify the columns for FoxP3 samples
foxp3_cols <- grep("FoxP3", colnames(normalized_counts))

# Calculate the mean expression for FoxP3 samples
foxp3_mean <- rowMeans(normalized_counts[, foxp3_cols])

# Construct a new data frame for heatmap
heatmap_data <- data.frame(
  B6_WT = normalized_counts[, "Jasmin_WT"], 
  FoxP3_TCF_HEB = foxp3_mean,
  TCFKO = normalized_counts[, "Osman_TCFKO"]
)

# Select top differentially expressed genes from both res1 and res2 for the heatmap

# Use normalized counts of combined top genes for heatmap
combined_top_genes <- unique(c(top_genes_res1, top_genes_res2))
heatmap_data_selected <- heatmap_data[combined_top_genes,]

# Generating heatmap
pheatmap(heatmap_data_selected,
         scale = "row",
         clustering_distance_rows = "euclidean",
         clustering_distance_cols = "euclidean",
         clustering_method = "complete",
         color = colorRampPalette(brewer.pal(9, "Blues"))(255))

Thank you in advance for your help, and any suggestions on how I can improve would be greatly appreciated.

DESeq2 Normalization • 373 views
ADD COMMENT
1
Entering edit mode
12 weeks ago
sure ▴ 100

Am I correct to understand that you have 3 conditions and have an imbalanced design (not the same number of replicates across all conditions)? My personal preference, If I have an imbalanced design or no biological replicate I try to use NOIseq in my analysis. https://www.bioconductor.org/packages/release/bioc/vignettes/NOISeq/inst/doc/NOISeq.pdf. There may be other options out there but I have found NOISeq suitable for my analysis.

ADD COMMENT
0
Entering edit mode

Hi, Thanks for your reply. For this test analysis I was running an imbalanced design yes - not ideal I know.

But I am currently downloading all the data so I will have 3 replicates for each group.

I will have a look into NOISeq.

Does this code look OK for what I am trying to - see if there is any different between my 3 condition? I will add the other 4 datasets in due course.

ADD REPLY

Login before adding your answer.

Traffic: 2696 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6