Question

DEGpatterns and functional analysis of clusters

0

Entering edit mode

3.1 years ago

giuly.lg.95 • 0

Good morning, I'm new in R programming (only 5 months) I have RNAseq data that I've analyzed with DESeq2. I wanted to perform a timecourse analysis to indentify gene module among different cell subpopulations. I performed LRT test and then I applied DEGpattern from DEGreport package. I obtained several clusters and now I want to perform functional analysis to explore associated functions of the gene groups of interest. But I had only the list of genes and the cluster. How can I exctract data (Fold Change, pvalue etc...) regarding specifical cluster genes from my DESeq2 results dataframe?

dds <- DESeqDataSetFromMatrix(countData = myMatrix,
                              colData = myMeta,
                              design = ~ celltype2)

dds_LRT <- DESeq(dds, test = "LRT", reduced= ~1)

res_LRT <- results(object = dds_LRT, )

# Subset results for faster cluster finding
sig_res_LRT <- res_LRT%>%

data.frame() %>%

rownames_to_column(var="gene") %>% 

as_tibble() %>% 

filter(padj < 10^(-5))

sigLRT_genes <- sig_res_LRT %>% 

pull(gene)

write.csv(sig_res_LRT, 
           file= "Timecourse/DEG_000001.csv")

rld <- rlogTransformation(dds_LRT)

rld_mat <- assay(rld)

# Obtain rlog values for those significant genes
cluster_rlog <- rld_mat[sigLRT_genes, ]

#`degPatterns` function from the 'DEGreport' package to show gene clusters across sample groups 

png(filename = "Timecourse/Clusters.png", width = 800, height = 600)

clusters <- degPatterns(cluster_rlog, metadata = myMeta, time = "celltype2", col = NULL, plot = T)

dev.off()
colnames(cluster_rlog) <- myMeta$celltype2

# Extract the Group 1 genes
cluster_groups <- clusters$df

group1 <- clusters$df %>%

filter(cluster == 1)

write.csv(group1, 
          file= "Timecourse/Clusters/Cluster_1_genelist.csv")

to perform GO analysis through clusterProfile, I need other data, like FC etc., in addition to the gene list of one specific cluster, doesn't it? Thanks in advance

RNA-Seq DEGreport DESeq2 R • 3.6k views

ADD COMMENT • link updated 3.1 years ago by Kevin Blighe 87k • written 3.1 years ago by giuly.lg.95 • 0

0

Entering edit mode

The Log2FCs reported for the LRT test are from the default contrast I believe, so before you use the FCs you should make sure this is what you want.

ADD REPLY • link 3.1 years ago by rpolicastro 13k

0

Entering edit mode

Yes, I understand. But if I want to perform GO for cluster 1 for example, I'd need Log2FC of cluster genes, doesn't it? I use clusterProfile with groupGO function

#Specific data for cluster1 genes from results dataframe
ID_g1 <- group1$genes

g1_FC <- res_LRT[ID_g1, "log2FoldChange"]

g1_pval <- res_LRT[ID_g1, "pvalue"]

g1_padj <- res_LRT[ID_g1, "padj"]

g1_df <- data_frame(ID_g1, g1_FC, g1_pval, g1_padj)

colnames(g1_df) <- c("ID", "log2FoldChange", "pvalue", "padj")

library(clusterProfiler)
library(DOSE)          
library(AnnotationHub)
library(org.Hs.eg.db)

g1_df <- bitr(g1_df$ID, fromType = "SYMBOL",

              toType = c("ENSEMBL", "ENTREZID"),

              OrgDb = org.Hs.eg.db)

ggo <- groupGO(gene = g1_df,

               OrgDb = org.Hs.eg.db,

               ont = "BP",

               level = 3,

               readable = TRUE)

But I have this error

Error in validObject(.Object) : 
  invalid class “groupGOResult” object: invalid object for slot "gene" in class "groupGOResult": got class "data.frame", should be or extend class "character"

ADD REPLY • link 3.1 years ago by giuly.lg.95 • 0

0

Entering edit mode

Can you show us what myMeta looks like?

ADD REPLY • link 3.1 years ago by rpolicastro 13k

0

Entering edit mode

                        name2.     celltype2  condition2.  expgroup2

21_T_R5                21_T_R5        R5       T                      T_R5

22_T_R6                22_T_R6        R6       T                      T_R6

23_T_R7                23_T_R7        R7       T                      T_R7

24_T_R8                24_T_R8        R8       T                      T_R8

25_T_R5                 25_T_R5        R5       T                      T_R5

26_T_R6                 26_T_R6        R6       T                      T_R6

27_T_R7                 27_T_R7        R7       T                      T_R7

28_T_R8                 28_T_R8        R8       T                      T_R8

29_T_R5                 29_T_R5        R5       T                      T_R5

30_T_R6                 30_T_R6        R6       T                      T_R6

31_T_R7                 31_T_R7        R7       T                      T_R7

32_T_R8                 32_T_R8        R8       T                      T_R8

I've difficult to write down and to be comprehinsible...but, for example, name2, celltype2, condition2 and exproug2 are 26_T_R6, R6, T and T_R6, respectively, whereas the first "26_T_R6" corresponds to the rowname. (This is valid for every rows)

ADD REPLY • link updated 3.1 years ago by rpolicastro 13k • written 3.1 years ago by giuly.lg.95 • 0

0

Entering edit mode

The FC values are from the R5 vs R6 comparison, but the pvalue of the LRT encompasses all values of celltype2. I'm not completely sure this is what you intended to get.

ADD REPLY • link 3.1 years ago by rpolicastro 13k