Question

error withstatistic "t" and "ks" on topGO

0

Entering edit mode

4.3 years ago

francoise.le-bohec ▴ 10

Dear all,

I am struggeling with the statistics test using topGO.

I have searched on the net and questionned my colleagues but it looks like this kind of error doesn't occur so often so it's not really documented:

This test is ok : resultFisher_FAvsBHI_MF=runTest(GOdata_FAvsBHI_MF, algorithm = "classic", statistic = "fisher")

when I have the same message error for the two others:

resultWeigt01_FAvsBHI_MF=runTest(GOdata_FAvsBHI_MF, algorithm = "weight01", statistic = "t")

resultKS_elim_FAvsBHI_MF=runTest(GOdata_FAvsBHI_MF, algorithm = "elim", statistic = "ks")

" -- Weight01 Algorithm --

     the algorithm is scoring 986 nontrivial nodes
     parameters: 
         test statistic: t
         score order: increasing

 Level 14:  3 nodes to be scored    (0 eliminated genes)

 Level 13:  8 nodes to be scored    (0 eliminated genes)

 Level 12:  10 nodes to be scored   (10 eliminated genes)

 Level 11:  8 nodes to be scored    (14 eliminated genes)

 Level 10:  17 nodes to be scored   (20 eliminated genes)

 Level 9:   45 nodes to be scored   (34 eliminated genes)

 Level 8:   78 nodes to be scored   (52 eliminated genes)

 Level 7:   149 nodes to be scored  (276 eliminated genes)

 Level 6:   338 nodes to be scored  (345 eliminated genes)

 Level 5:   175 nodes to be scored  (524 eliminated genes)

 Level 4:   112 nodes to be scored  (868 eliminated genes)

 Level 3:   34 nodes to be scored   (1124 eliminated genes)

 Level 2:   8 nodes to be scored    (1197 eliminated genes)

 Level 1:   1 nodes to be scored    (1394 eliminated genes)

Error in t.test.default(x = x.G, y = x.NotG, var.equal = var.equal, alternative = ifelse(aa, : not enough 'y' observations"

I also noticed that some people perform all these test with Fisher (maybe to avoid this kind of problem?) but didn't explain their choice.

It looks like it could be a p-value format error since the Fisher test can be done but not the two other tests. So I've checked that there is no NA values (set at 1) and also test with/without using the scientific annotation.

Please let me know if you need more details,

Thanks for your help,

Best regards,

RNA-Seq R • 1.6k views

ADD COMMENT • link 4.3 years ago by francoise.le-bohec ▴ 10

0

Entering edit mode

...but how many input genes are in GOdata_FAvsBHI_MF?

ADD REPLY • link 4.3 years ago by Kevin Blighe 88k

0

Entering edit mode

2808 genes, it should be fine shouldn't it?

ADD REPLY • link 4.3 years ago by francoise.le-bohec ▴ 10

0

Entering edit mode

Should be - yes. Can you show the other commands that you ran before runTest?

Here is how I typically run GO enrichment via topGO (I had conveniently written this for a beginner, recently):

  require(topGO)
  require(org.Hs.eg.db)

# Now perform gene enrichment against GO
  # GO BP (Gene Ontology Biological Processes)
    rankedGenes # a named vector with p-values as values (all <0.05) and Entrez gene IDs as names

    # create a mapping of GO terms and their respective genes (as Entrez IDs)
    allGO2genes <- annFUN.org(
      whichOnto = 'BP',
      feasibleGenes = NULL,
      mapping = 'org.Hs.eg.db',
      ID = 'entrez')

    # This is just a filter function that can be applied to the input gene list (e.g. based on p-value).
    # Here, it is not necessary and is setup to just include all genes in our input list.
    # It will be used in the next function, 'new()'
    selection <- function(x) TRUE

    # Perform the enrichment. For a GO term to be returned, 10 of our genes must be present in it.
    GOdata <- new('topGOdata',
      ontology = 'BP',
      allGenes = rankedGenes,
      annot = annFUN.GO2genes,
      GO2genes = allGO2genes,
      geneSel = selection,
      nodeSize = 10)

    # With the enrichment, derive some test statistics
    # In order to make use of the rank information, use Kolmogorov-Smirnov (KS) test
    results.ks <- runTest(GOdata, algorithm = 'classic', statistic = 'ks')
    goEnrichment <- GenTable(GOdata, KS = results.ks, orderBy = 'KS', topNodes = length(results.ks@score))

    # filter the results by p-value
    # Here, I set it to 1 to include everything
    p <- 1.00
    goEnrichment <- goEnrichment[goEnrichment$KS < p,]

    # only retain relevant information
    goEnrichment <- goEnrichment[,c('GO.ID','Term','KS')]

ADD REPLY • link 4.3 years ago by Kevin Blighe 88k

0

Entering edit mode

In fact we roughly do the same thing, just with another database that didn't exist per se, explaining why I had to merge 2 tables in oder to have a database to work with .

I think the enrichment is not bad at the end, it more looks like a problem of statistic test?

ADD REPLY • link 4.3 years ago by francoise.le-bohec ▴ 10

0

Entering edit mode

There are many different combinations of algorithm and statistic, and I admit that I have not fully explored these. Is the problem that your enrichment p-values are not reaching statistical significance (p<0.05)?

ADD REPLY • link 4.3 years ago by Kevin Blighe 88k

0

Entering edit mode

Sorry I couldn't answer yesterday because I am new here and thus not allowed to post more than 6 messages per day...

Right, it is not clear for me which statistic should be used, the vignette says that ks and Fisher can be performed in that case... and it works with my data if I change the algorithm (weight01 or elim) but only if I choose the Fisher statistic, not the t nor the KS... And depending on the algorithm I use I got from 6 to 20 significative p-values < 0.05.

ADD REPLY • link 4.2 years ago by francoise.le-bohec ▴ 10

0

Entering edit mode

Each will have advantages in different contexts, just like p-value adjustment methods. If you are unsure, then just use the default values.

ADD REPLY • link 4.2 years ago by Kevin Blighe 88k

0

Entering edit mode

Thanks for your script Kevin, I'll study it now to try to find where the problem can be with mine ...

Here is the script I've used:

# table with genes and padj values

 FAvsBHI_topGO <-read.csv("FAvsBHI_topGO.csv", sep=";", stringsAsFactors=FALSE)
 GOterms<-read.csv("GOterms", sep=";", stringsAsFactors=FALSE)
 str(FAvsBHI_topGO)
# 'data.frame': 2808 obs. of  2 variables:
# $ WP.ID: chr  "WP_001290433.1" "WP_000969811.1" "WP_001789359.1" "WP_000775113.1" ...
# $ padj : num  0.287 0.378 0.81 0.719 0.247 ...
# => format OK

# Vectors creation

list_FAvsBHI_topGO=data.frame(FAvsBHI_topGO$"WP.ID",FAvsBHI_topGO$"padj")
dim(list_FAvsBHI_topGO)
list_FAvsBHI_topGO_ok=as.numeric(list_FAvsBHI_topGO[,2])
View(list_FAvsBHI_topGO_ok) # 1 column with pvalues

# NA replaced with the value 1
names(list_FAvsBHI_topGO_ok)=list_FAvsBHI_topGO[,1]
list_FAvsBHI_topGO_ok[is.na(list_FAvsBHI_topGO_ok)]=1

##  genes universes creation for MF, BP et CC

# Loading/check of the environments ( from GO.db, loaded with topGO library)
Library (topGO)
BPterms<- ls(GOBPTerm)
head (BPterms)
# [1] "GO:0000001" "GO:0000002" "GO:0000003" "GO:0000011" "GO:0000012" "GO:0000017"
MFterms<- ls(GOMFTerm)
head (MFterms)
# [1] "GO:0000006" "GO:0000007" "GO:0000009" "GO:0000010" "GO:0000014" "GO:0000016"
CCterms<- ls(GOCCTerm)
head (CCterms)

# Merge of teh table with genes names and their GO category  (2 columns)
GO2genes_CC <- merge(CCterms, GO2genes_tot, by.x = "GO", by.y="GO")
write.table(GO2genes_CC, "GO2genes_CC.xls",col=NA, sep="\t",dec=".")
GO2genes_BP <- merge(BPterms, GO2genes_tot, by.x = "GO", by.y="GO")
write.table(GO2genes_BP, "GO2genes_BP.xls",col=NA, sep="\t",dec=".")
GO2genes_MF <- merge(MFterms, GO2genes_tot, by.x = "GO", by.y="GO")
write.table(GO2genes_MF, "GO2genes_MF.xls",col=NA, sep="\t",dec=".")

#################################################################################################################
#### With linux: creation of vectors with the name of each gene and its corresponding GO categories
#################################################################################################################
#!/bin/sh

awk '
BEGIN{FS=";"; OFS=FS};
{ arr[$1] = arr[$1] == ""? $2 : arr[$1] "," $2 }
END {for (i in arr) print i, arr[i] }
' test_merge.xls>test_merge_OK.xls

####################################################
# Analyse FA vs BHI
####################################################

library(topGO)

GO2genes_MF_OK<-readMappings(file="GO2genes_MF_OK.csv",sep = ";")
#str(head(GO2genes_MF_OK))


View(FAvsBHI_topGO)

topDiffGenes <- function(allScore) {
  return(allScore <= 0.05)
}
FAvsBHI_topGO_tmp <-topDiffGenes(FAvsBHI_topGO)

sum(FAvsBHI_topGO_tmp) ## the number of selected genes : 322



list_FAvsBHI=list_FAvsBHI_topGO_ok

GOdata_FAvsBHI_MF=new("topGOdata",
                              description = "GO analysis on genes specific to BHI vs FA , MF (322 genes)",
                              ontology = "MF",
                              allGenes = list_FAvsBHI,
                              geneSel = topDiffGenes,
                              annot = annFUN.gene2GO,
                              gene2GO = GO2genes_MF_OK)



resultFisher_FAvsBHI_MF=runTest(GOdata_FAvsBHI_MF, algorithm = "classic", statistic = "fisher")

 resultWeigt01_FAvsBHI_MF=runTest(GOdata_FAvsBHI_MF, algorithm = "weight01", statistic = "t")
 resultKS_elim_FAvsBHI_MF=runTest(GOdata_FAvsBHI_MF, algorithm = "elim", statistic = "ks")

ADD REPLY • link updated 4.3 years ago by Kevin Blighe 88k • written 4.3 years ago by francoise.le-bohec ▴ 10

0

Entering edit mode

Sorry but the font is really erratic...

ADD REPLY • link 4.3 years ago by francoise.le-bohec ▴ 10

0

Entering edit mode

I have tidied it via the 101 010 button

ADD REPLY • link 4.3 years ago by Kevin Blighe 88k

1

Entering edit mode

Thanks for the trick!

ADD REPLY • link 4.3 years ago by francoise.le-bohec ▴ 10