TopGO gene enrichment analysis with predefined set of genes for non-model organism
0
0
Entering edit mode
3.6 years ago
AP ▴ 80

Hello everyone,

I am working with a non-model fungus and trying to do gene enrichment analysis for differentially expressed genes. For this I got the GO terms for a list of genes using InterProscan. Now I am using these list of genes with respective GO terms to do enrichment analysis and to make plots in R using TopGO package.

I am following thisworkshop from UCdavis to achieve my goal, however I have a small problem with the p-value. I used this code to list the gene with the p-values

tmp <- ifelse(DE$adj.P.Val < pcutoff, 1, 0)
geneList <- tmp

and I am getting the following final result:

GO.ID                                           Term Annotated Significant
1 GO:0008213                             protein alkylation         1           1
2 GO:0046903                                      secretion         1           1
3 GO:0042219 cellular modified amino acid catabolic process         1           1
4 GO:0006979                   response to oxidative stress         1           1
5 GO:0019438         aromatic compound biosynthetic process        77          77
6 GO:0015672          monovalent inorganic cation transport         2           2
  Expected raw.p.value
1        1           1
2        1           1
3        1           1
4        1           1
5       77           1
6        2           1

Instead of raw.p.value as 1 I need the actual p-values I provided in the file. Can you please help, if I am missing anything. I went over my codes multiple times but I have not done anything different as mentioned in the tutorial but my results are different from what I need.

I will appreciate your help. Thanks, Ambika

RNA-Seq TopGO Geneenrichment GOterm R • 2.4k views
ADD COMMENT
2
Entering edit mode

It seems that you are not providing all commands that you have used, so, it is difficult for us to debug this for you. Also, it would really help [really] to provide a minimal reproducible example as input data.

ADD REPLY
0
Entering edit mode

Kevin,

This is how my input files looks like:

GO_total_upregulated_genes.csv

    GeneID  Goterm
    FUN_004018  GO:0016491
    FUN_004018  GO:0046872
    FUN_004018  GO:0055114
    FUN_003797  GO:0000287
    FUN_003797  GO:0030976
    FUN_003797  GO:0030976
    FUN_003797  GO:0016831

Total_upregulated_genes_pvalue.csv

GeneID  padj
FUN_000233  0.013768183
FUN_000410  2.57E-10
FUN_000424  0.006996913
FUN_000451  0.028650184
FUN_000511  1.84E-12
FUN_000590  0.011309524
FUN_000668  0.030117801

My code looks like this

gene.go <- read.csv("GO_total_upregulated_genes.csv",header = T, stringsAsFactors = F, sep = ",")
gene2GO <- tapply(gene.go$Goterm, gene.go$GeneID, function(x)x)
head(gene2GO)

infile <- "Total_upregulated_genes_pvalue.csv"
pcutoff <- 0.05 # cutoff for defining significant genes
DE <- read.delim(infile, stringsAsFactors = F, header = T, sep=",")
head(DE)

tmp <- ifelse(DE$padj < pcutoff, 1, 0)
geneList <- tmp



names(geneList) <- (DE$GeneID)
head(geneList)


    GOdata_UP_BP <- new("topGOdata",ontology = "BP",allGenes = geneList,
                        geneSelectionFun = function(x)(x == 1),
                        annot = annFUN.gene2GO, gene2GO = gene2GO)

    resultFis_UP_BP <- runTest(GOdata_UP_BP,algorithm = "elim", statistic = "fisher")

    tab <- GenTable(GOdata_UP_BP, raw.p.value = resultFis_UP_BP, topNodes = length(resultFis_UP_BP@score),
                    numChar = 120)
    head(tab)

I am only providing the genes that are significantly upregulated. As you can see the annotated, significant and expected gene numbers also look same. Am I doing anything wrong here. Please suggest.

ADD REPLY
0
Entering edit mode

Are you sure that your gene2GO object is constructed correctly? - it should be a list, as elaborated in section 4.3 of: https://www.bioconductor.org/packages/devel/bioc/vignettes/topGO/inst/doc/topGO.pdf

Also, I cannot verify from your code, but geneList should be a vector of p- (or other) values, whose names are gene names, i.e., a named vector.

ADD REPLY
0
Entering edit mode

Kevin, I think I do have a list as mentioned in the protocol. The list looks like this

> head(gene2GO)
$`FUN_000109`
[1] "GO:0005515" "GO:0005515" "GO:0005515" "GO:0005515"

$FUN_000399
[1] "GO:0008374" "GO:0008374" "GO:0008374" "GO:0008374" "GO:0008374" "GO:0008374"

$FUN_000424
[1] "GO:0006950" "GO:0016021"

As for the p-values I am not sure. I am assigning the significant p-values as 1 and non-significant as 0. and the

head(geneList)
FUN_000233 FUN_000410 FUN_000424 FUN_000451 FUN_000511 FUN_000590 
         1          1          1          1          1          1

I don't know where did I do wrong, but I am kind of stuck and could not proceed further because I need those p-values to make plots.

ADD REPLY
0
Entering edit mode

As far as I understand, the code is running but you want 'raw' p-values for the GO terms (?) or the functional entities (FUN_*) (?)

You are eliminating all information about p-values with this line:

tmp <- ifelse(DE$adj.P.Val < pcutoff, 1, 0)

Perhaps I am not quite understanding 100% what is the problem. I provide an example with topGO and the Kolmogorov-Smirnov test here: A: GO analysis using topGO

ADD REPLY
0
Entering edit mode

Yes I want the p-values and the gene id (FUN__) in my results, which I am not getting. Thank you for the link, I will check that one.

ADD REPLY
0
Entering edit mode

The results would be of the form:

GO000XYX ... enrichment p-value ... genes identified as part of this GO ID
GO000XYX ... enrichment p-value ... genes identified as part of this GO ID
GO000XYX ... enrichment p-value ... genes identified as part of this GO ID

If you want to add the original p-value for each gene identified in each term, then that will require customised coding, I think.

ADD REPLY

Login before adding your answer.

Traffic: 2507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6