Entering edit mode
2.9 years ago
Lepomis_8
▴
30
Hello,
I'm having trouble generating a dotplot of functionally enriched pathways across different groups. Most of them are mapped as NA via
Warning message: In order(as.numeric(unique(result$Cluster))) : NAs introduced by coercion
I'm trying to figure out why this happens. I've included the code to the original code I am adapting it from, retrieved from https://f1000research.com/articles/9-709. However, even if I follow their code, I get NAs, so I wonder if someone else can replicate it without NAs. I appreciate any help I can get!
Here is my code below:
## gprofiler
# load the package
library(gprofiler2)
# installing additional packages
# if (!requireNamespace("BiocManager", quietly = TRUE))
# install.packages("BiocManager") BiocManager::install(c("clusterProfiler", "enrichplot", "DOSE"))
# loading the additional packages
library(clusterProfiler)
library(enrichplot)
library(DOSE) # needed to convert to enrichResult object
library(airway)
library(DESeq2)
library(gprofiler2)
# load the airway data data(airway)
# construct the DESeqDataSet object
ddsMat = DESeqDataSetFromMatrix(countData = assay(airway),
colData = colData(airway),
design = ~ cell + dex)
# run DESeq2 pipeline
dds = DESeq(ddsMat)
# get the results
results = results(dds, contrast = c("dex", "trt", "untrt"),
alpha = 0.05, lfcThreshold = 1)
# keep only the significant genes
results_sig = subset(results, padj < 0.05)
# get the significant up-regulated genes
up = subset(results_sig, log2FoldChange > 0)
# get the significant down-regulated genes
down = subset(results_sig, log2FoldChange < 0)
up_names = gconvert(row.names(up)) down_names = gconvert(row.names(down))
# enrichment analysis using gene names
multi_gp = gost(list("up-regulated" = up_names$name,
"down-regulated" = down_names$name), multi_query = FALSE, evcodes = TRUE)
# modify the g:Profiler data frame
gp_mod = multi_gp$result[,c("query", "source", "term_id",
"term_name", "p_value", "query_size",
"intersection_size", "term_size",
"effective_domain_size", "intersection")]
gp_mod$GeneRatio = paste0(gp_mod$intersection_size, "/", gp_mod$query_size)
gp_mod$BgRatio = paste0(gp_mod$term_size, "/", gp_mod$effective_domain_size) names(gp_mod) = c("Cluster", "Category", "ID", "Description", "p.adjust",
"query_size", "Count", "term_size", "effective_domain_size",
"geneID", "GeneRatio", "BgRatio")
gp_mod$geneID = gsub(",", "/", gp_mod$geneID)
row.names(gp_mod) = gp_mod$ID
# define as compareClusterResult object
gp_mod_cluster = new("compareClusterResult", compareClusterResult = gp_mod)
# define as enrichResult object
gp_mod_enrich = new("enrichResult", result = gp_mod)
enrichplot::dotplot(gp_mod_cluster)
Can't say without seeing your gene list, but it's likely some of your gene/cluster names can't be converted to numeric using
as.numeric()
, resulting in NAs. See here: https://statisticsglobe.com/warning-message-nas-introduced-by-coercion-in-rThanks. I noticed the formatting of my code was messed up, so I fixed it. The gene list came from the library(airway). If you'll notice there is a line
gp_mod$geneID = gsub(",", "/", gp_mod$geneID)
that was the solution in the article you linked, and it still didn't remove the NAs! I tried changing it togp_mod$geneID = gsub(",", "", gp_mod$geneID)
and still no change.The problem occurs because the
gp_mod_cluster
is not properly formatted forenrichplot::dotplot
. I'm not sure if any changes togprofiler2
orenrichplot::dotplot
since the publication of the paper you linked has caused this problem.dotplot
internally calls a function calledggplot2::fortify
, which, as far as I can tell, is intended to convertgp_mod_cluster
into a more tidy dataframe for plotting. The warning ("Warning message: In order(as.numeric(unique(result$Cluster))) : NAs introduced by coercion") occurs becausefortify()
is trying to perform:order(as.numeric(unique(result$Cluster)))
. However, this doesn't work because the$Cluster
can't be converted to numeric:Okay, thank you! I will try to get around this by just creating a dotplot manually.