clusterProfiler enrichGO gene size filter with minGSSize doesn't seem to work
1
0
Entering edit mode
4.5 years ago
gohtwae • 0

I'm doing GO enrichment analysis using clusterProfiler. I used minGSSize filter (10, seems like it's default though) to restrict gene size, but I got enrichment terms with less than 10 elements.

I can manually filter them out. But, I'm concerning whether it is a proper way of handling this, and whether I get wrong statistical numbers (pvalue etc.) because of filtering failure or something. I have more cases but posted one case with example code below.

require(clusterProfiler)
require(org.At.tair.db)

# sorry for this annoying input list, but my original input list is longer than this.
GOI <-  c("AT1G02920", "AT1G02930", "AT1G03620", "AT1G03760", "AT1G03880", "AT1G05675", "AT1G11610", "AT1G12790", "AT1G13520", "AT1G13990", "AT1G14540", "AT1G14950", "AT1G15125", "AT1G15670", "AT1G15920", "AT1G16030", "AT1G19250", "AT1G21110", "AT1G21120", "AT1G24100", "AT1G24330", "AT1G26250", "AT1G26380", "AT1G26390", "AT1G26410", "AT1G26420", "AT1G27565", "AT1G44130", "AT1G47540", "AT1G49000", "AT1G51920", "AT1G53950", "AT1G56240", "AT1G56250", "AT1G61970", "AT1G62130", "AT1G63860", "AT1G64400", "AT1G65486", "AT1G65845", "AT1G66500", "AT1G66700", "AT1G66920", "AT1G67270", "AT1G68230", "AT1G68862", "AT1G69280", "AT1G69920", "AT1G69930", "AT1G70140", "AT1G72060", "AT1G72900", "AT1G74360", "AT1G74590", "AT1G75000", "AT1G75335", "AT1G75830", "AT1G80840", "AT2G02010", "AT2G02930", "AT2G07698", "AT2G07719", "AT2G15220", "AT2G17040", "AT2G18370", "AT2G18660", "AT2G19910", "AT2G24600", "AT2G25470", "AT2G27389", "AT2G28450", "AT2G29330", "AT2G32190", "AT2G32830", "AT2G33580", "AT2G35980", "AT2G36950", "AT2G38860", "AT2G38870", "AT2G39210", "AT2G39350", "AT2G39400", "AT2G41010", "AT2G41280", "AT2G43000", "AT2G44370", "AT2G46430", "AT2G46650", "AT3G09405", "AT3G15340", "AT3G15518", "AT3G15590", "AT3G16020", "AT3G16030", "AT3G16530", "AT3G19470", "AT3G22800", "AT3G23150", "AT3G23250", "AT3G23570", "AT3G25900", "AT3G26170", "AT3G26210", "AT3G26830", "AT3G27870", "AT3G29100", "AT3G45420", "AT3G51450", "AT3G53160", "AT3G54150", "AT3G58930", "AT4G01010", "AT4G02280", "AT4G02520")


enrichGO_test <- 
  enrichGO(GOI, 
           maxGSSize = 500,
           minGSSize = 10,
           OrgDb = org.At.tair.db, 
           ont = 'BP',
           keyType = 'TAIR',
           pvalueCutoff = 0.01,
           pAdjustMethod = 'BH',
           qvalueCutoff = 0.01
           )


# plot image view
dotplot(enrichGO_test)


# table view
View(enrichGO_test@result)  
enrichGO_test@result[enrichGO_test@result$ID == 'GO:0071456',]

In the dotplot(enrichGO_test) result above, GO(GO:0071456) with description 'cellular response to hypoxia' plotted, but GeneRatio for the term has '4/88'.

Versions:

R version 3.6.1 (2019-07-05)
clusterProfiler version 3.12.0
RStudio 1,2,1335
Bioconductor version 3.9 (BiocManager 1.30.8)
  
clusterProfiler • 6.3k views
ADD COMMENT
2
Entering edit mode
4.5 years ago
Guangchuang Yu ★ 2.6k

The minGSSize and maxGSSize are used to restrict gene set size not for your gene list. If we looking at the formula, https://yulab-smu.github.io/clusterProfiler-book/chapter2.html#over-representation-analysis, they apply to the M.

ADD COMMENT
0
Entering edit mode

Thank you for correcting me. What I had mentioned above seems to be the 'k' then, and what I wanted in the original question seems like the gsfilter() function as in the link below. https://github.com/YuLab-SMU/clusterProfiler/issues/46

So, I can filter my result as follows.

enrichGO_test_filter10 <- gsfilter(enrichGO_test, by = 'Count', min = 10)
View(enrichGO_test_filter10@result)

Looks great! Thanks again.

ADD REPLY

Login before adding your answer.

Traffic: 2909 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6