Entering edit mode
16 months ago
Lepomis_8 ▴ 30
I'm having trouble generating a dotplot of functionally enriched pathways across different groups. Most of them are mapped as NA via
Warning message: In order(as.numeric(unique(result$Cluster))) : NAs introduced by coercion
I'm trying to figure out why this happens. I've included the code to the original code I am adapting it from, retrieved from https://f1000research.com/articles/9-709. However, even if I follow their code, I get NAs, so I wonder if someone else can replicate it without NAs. I appreciate any help I can get!
Here is my code below:
## gprofiler # load the package library(gprofiler2) # installing additional packages # if (!requireNamespace("BiocManager", quietly = TRUE)) # install.packages("BiocManager") BiocManager::install(c("clusterProfiler", "enrichplot", "DOSE")) # loading the additional packages library(clusterProfiler) library(enrichplot) library(DOSE) # needed to convert to enrichResult object library(airway) library(DESeq2) library(gprofiler2) # load the airway data data(airway) # construct the DESeqDataSet object ddsMat = DESeqDataSetFromMatrix(countData = assay(airway), colData = colData(airway), design = ~ cell + dex) # run DESeq2 pipeline dds = DESeq(ddsMat) # get the results results = results(dds, contrast = c("dex", "trt", "untrt"), alpha = 0.05, lfcThreshold = 1) # keep only the significant genes results_sig = subset(results, padj < 0.05) # get the significant up-regulated genes up = subset(results_sig, log2FoldChange > 0) # get the significant down-regulated genes down = subset(results_sig, log2FoldChange < 0) up_names = gconvert(row.names(up)) down_names = gconvert(row.names(down)) # enrichment analysis using gene names multi_gp = gost(list("up-regulated" = up_names$name, "down-regulated" = down_names$name), multi_query = FALSE, evcodes = TRUE) # modify the g:Profiler data frame gp_mod = multi_gp$result[,c("query", "source", "term_id", "term_name", "p_value", "query_size", "intersection_size", "term_size", "effective_domain_size", "intersection")] gp_mod$GeneRatio = paste0(gp_mod$intersection_size, "/", gp_mod$query_size) gp_mod$BgRatio = paste0(gp_mod$term_size, "/", gp_mod$effective_domain_size) names(gp_mod) = c("Cluster", "Category", "ID", "Description", "p.adjust", "query_size", "Count", "term_size", "effective_domain_size", "geneID", "GeneRatio", "BgRatio") gp_mod$geneID = gsub(",", "/", gp_mod$geneID) row.names(gp_mod) = gp_mod$ID # define as compareClusterResult object gp_mod_cluster = new("compareClusterResult", compareClusterResult = gp_mod) # define as enrichResult object gp_mod_enrich = new("enrichResult", result = gp_mod) enrichplot::dotplot(gp_mod_cluster)
Can't say without seeing your gene list, but it's likely some of your gene/cluster names can't be converted to numeric using
as.numeric(), resulting in NAs. See here: https://statisticsglobe.com/warning-message-nas-introduced-by-coercion-in-r
Thanks. I noticed the formatting of my code was messed up, so I fixed it. The gene list came from the library(airway). If you'll notice there is a line
gp_mod$geneID = gsub(",", "/", gp_mod$geneID)that was the solution in the article you linked, and it still didn't remove the NAs! I tried changing it to
gp_mod$geneID = gsub(",", "", gp_mod$geneID)and still no change.
The problem occurs because the
gp_mod_clusteris not properly formatted for
enrichplot::dotplot. I'm not sure if any changes to
enrichplot::dotplotsince the publication of the paper you linked has caused this problem.
dotplotinternally calls a function called
ggplot2::fortify, which, as far as I can tell, is intended to convert
gp_mod_clusterinto a more tidy dataframe for plotting. The warning ("Warning message: In order(as.numeric(unique(result$Cluster))) : NAs introduced by coercion") occurs because
fortify()is trying to perform:
order(as.numeric(unique(result$Cluster))). However, this doesn't work because the
$Clustercan't be converted to numeric:
Okay, thank you! I will try to get around this by just creating a dotplot manually.