Comparing 3 Data Sets using DeSeq and Heatmaps
Entering edit mode
12 weeks ago

Hi all,

I am new to bioinformatics analysis, so I'd appreciate if someone could check my code for the goal I am trying to achieve.

I have 3 samples -

  1. Wild Type (WT)
  2. FoxP3-TCF-HEB (I have 3 replicates of this)
  3. TCFKO

I have defined these in the sample information csv and loaded into a colData variable enter image description here

I want to merge (average) the three FoxP3 replicate FeatureCounts outputs for visualization in the heatmap.

This is what my FeatureCounts output looks like:

enter image description here

I have successfully got the DeSeq differential expression values for the 3 samples by using the contrast function to compare each of the experimental groups against the wild type based on the gene.

But where I think I may have gone wrong is in the construction of the heatmap.

Please could someone check the R Script below to see if they would do it in a similar way.

#Script for DeSeq Analysis 

#Load Count Data
counts_data <- read.csv('/Users/m300189/Downloads/countsText_3Samples.csv', header=TRUE, row.names = 1)


#Load Sample Info
colData <- read.csv ('/Users/m300189/Downloads/Data/SampleInfo.csv', header = TRUE, row.names = 1)

#Do the row names in ColData match the columns in count_data
all(colnames(counts_data) %in% rownames(colData))

#Are they in the same order 

all(colnames(counts_data) == rownames(colData))

#construct DeSeqDataSet 

dds <- DESeqDataSetFromMatrix(countData = counts_data,
                       colData = colData,
                       design = ~ Gene )

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

#Run DeSeq and Contrast analysis/volcano plot 

dds <- DESeq(dds)

res1 <- results(dds, contrast = c("Gene", "WT", "FoxP3-TCF-HEB"))
res2 <- results(dds, contrast = c("Gene", "WT", "TCFKO"))

summary (res2)


normalized_counts <- counts(dds, normalized = TRUE)

#Generating a heatmap 
#1. Merging the FoxP3 samples 

# Identify the columns for FoxP3 samples
foxp3_cols <- grep("FoxP3", colnames(normalized_counts))

# Calculate the mean expression for FoxP3 samples
foxp3_mean <- rowMeans(normalized_counts[, foxp3_cols])

# Construct a new data frame for heatmap
heatmap_data <- data.frame(
  B6_WT = normalized_counts[, "Jasmin_WT"], 
  FoxP3_TCF_HEB = foxp3_mean,
  TCFKO = normalized_counts[, "Osman_TCFKO"]

# Select top differentially expressed genes from both res1 and res2 for the heatmap

# Use normalized counts of combined top genes for heatmap
combined_top_genes <- unique(c(top_genes_res1, top_genes_res2))
heatmap_data_selected <- heatmap_data[combined_top_genes,]

# Generating heatmap
         scale = "row",
         clustering_distance_rows = "euclidean",
         clustering_distance_cols = "euclidean",
         clustering_method = "complete",
         color = colorRampPalette(brewer.pal(9, "Blues"))(255))

Thank you in advance for your help, and any suggestions on how I can improve would be greatly appreciated.

DESeq2 Normalization • 373 views
Entering edit mode
12 weeks ago
sure ▴ 100

Am I correct to understand that you have 3 conditions and have an imbalanced design (not the same number of replicates across all conditions)? My personal preference, If I have an imbalanced design or no biological replicate I try to use NOIseq in my analysis. There may be other options out there but I have found NOISeq suitable for my analysis.

Entering edit mode

Hi, Thanks for your reply. For this test analysis I was running an imbalanced design yes - not ideal I know.

But I am currently downloading all the data so I will have 3 replicates for each group.

I will have a look into NOISeq.

Does this code look OK for what I am trying to - see if there is any different between my 3 condition? I will add the other 4 datasets in due course.


Login before adding your answer.

Traffic: 2696 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6