Hello,
I'm using this workflow DGE analysis using LRT in DESeq2 to look for genes that are significantly differentially expressed over a time-series (2,7,14,53 days) of a RNA-Seq count matrix of two soil types (Forest vs. Plastic). However, I'm stuck after the DEG analysis using an LRT model and eventually identifying gene clusters with KEGGREST
function.
If you look at the figure, I am interested in Cluster 1, Cluster 8 and Cluster 13 since I want to identify genes that are significantly over-expressed in the early time-points in the plastic soil (blue).
So after DEG analysis with DESEQ (dds <- DESeqDataSetFromMatrix(countData = tab,
colData = cond,
design = ~ Soil+Time+Soil:Time)
), I performed clustering with degPatterns()
to identify genes that change over time :
cond$Time<- factor(cond$Time,levels=c("2d","7d","14d","53d"))
clusters <- degPatterns(cluster_rlog, metadata = cond, time="Time", col="Soil",plot=T)
clus=degPlotCluster(clusters$normalized,"Time","Soil")
Figure 1: Significant gene clusters *in a time-series in two different soil types
Now I extract genes from cluster 1 and corresponding KEGG ID's:
cluster_groups <- clusters$df
group1 <- clusters$df %>%
filter(cluster == 1)
#Get Kegg ID#s from clusters
library(KEGGREST)
library(KEGGgraph)
# Extract KEGG IDs for the genes in cluster 1
kegg_ids <- group1$genes
# Get pathway information for the KEGG IDs
pathway_info <- keggGet(kegg_ids)
`
Since keggGet() retrieves only the first 10 entries of a list, I iterate over each KEGG ID in order to fill my list with all 25 enriched KEGG ID's in cluster 1
# Empty list to store pathway information
pathway_info_list <- list()
# Iterate over each KEGG ID and fetch pathway information
for (id in group1$genes) {
pathway_info <- keggGet(id)
# Store pathway information in the list using KEGG ID as the list index
pathway_info_list[[id]] <- pathway_info
}
Now I need to know, how can I use the output of keggGet
to show if these enriched genes are actually over/under-expressed in my conditions?
How can I link my cluster genes back to DEG data and create a table to plot with e.g. gene name, pathway name and log2foldchange and pvalue etc.?
Thank you!!
I've updated the post