Question

No enriched GO terms with 1000 more genes

0

Entering edit mode

3.9 years ago

Ruixuan • 0

Hi everyone, I'm doing a GO analysis after finish the statistical test by edgeR.

Before, I did the comparison between group1 vs group2, group1 vs group3, group1 vs group4.

Here the problem came when I compared group1 vs group4, there are 1740 genes showing to be significantly overrepresented in group 4.

However, when I used the code below

enrich.go.BP = enrichGO(gene = up_gene.4vs1$GeneID,
                    OrgDb = Acan.OrgDb,
                    keyType = "ENTREZID",
                    ont = "BP", pvalueCutoff = 0.01,
                    qvalueCutoff = 0.05, readable = T)

There is no enriched terms in the result.

This code worked well when I compared other groups to group1, so I think there may be no problem on code. Thus, I'm wondering why I got this result? How can I fix it? Is it that I got too many genes which locate in almost all kinds of category so that there is no statistical significant enriched terms?

Thank you in advance.

Edited: 2020-06-11 For more information on up_gene4vs1 and Acan.OrgDb.

The Acan.OrgDb is the one I loaded by using Annotationhub, because my target species "acanthamoeba castellanii" is not a model organism.

hub <- AnnotationHub::AnnotationHub()
amoeba <- query(hub, "Acanthamoeba castellanii")
# title       
# AH73987 | Transcript information for Acanthamoeba castellanii str Neff
# AH73987 | Transcript information for Acanthamoeba castellanii str Neff
# AH74626 | Transcript information for Acanthamoeba castellanii str Neff
# AH81410 | org.Acanthamoeba_castellanii_Neff_strain.eg.sqlite          
# AH81411 | org.Acanthamoeba_castellanii_str._Neff.eg.sqlite            
# AH81412 | org.Acanthamoeba_castellanii_strain_Neff.eg.sqlite

Here I chose the AH81410 because its Db type is OrgDb.

Acan.OrgDb <- hub[["AH81410"]]
> Acan.OrgDb
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Acanthamoeba castellanii_Neff_strain
| SPECIES: Acanthamoeba castellanii_Neff_strain
| CENTRALID: GID
| Taxonomy ID: 1257118
| Db type: OrgDb
| Supporting package: AnnotationDbi

And from colnames(Acan.OrgDb), we could see that it supported ENTREZID.

> columns(Acan.OrgDb)
[1] "ACCNUM"      "ALIAS"       "CHR"         "ENTREZID"    "EVIDENCE"    "EVIDENCEALL" "GENENAME"    "GID"         "GO"          "GOALL"      
[11] "ONTOLOGY"    "ONTOLOGYALL" "PMID"        "REFSEQ"      "SYMBOL"

Then, I prepared my significant genes list into ENTREZID format. The format is generated by combining ORFID, locus_tag and annotation from files downloaded from NCBI.

Here, the GeneID is recording those id in ENTREZID format.

>up_gene.4vs1
            Locus_tag     ORFID Name      Accession  Start   Stop Strand   GeneID Locus Protein_product Length                                             
Protein_Name
1  ACA1_000790  gene5490   Un NW_004457578.1   5136   5699      + 14921342    NA  XP_004343320.1    187                         hypothetical protein ACA1_000790
2  ACA1_001250  gene2057   Un NW_004457658.1   4004  11317      + 14924768    NA  XP_004353303.1   1925                         hypothetical protein ACA1_001250
3  ACA1_001280  gene2060   Un NW_004457658.1  17392  18733      - 14924773    NA  XP_004353305.1    258                         hypothetical protein ACA1_001280
4  ACA1_001300  gene2062   Un NW_004457658.1  20701  23681      - 14924770    NA  XP_004353306.1    599                    fucose1-phosphate guanylyltransferase

You may also notice that there are hypothetical proteins which could blur the prediction. Although there are 691 entries of hypothetical protein, there are still (1049/1740) entries left.

Thus, I'm a little bit confused about the results from enrichGO showing no enriched GO terms.

Could you give me some advices? Thank you in advance.

GO Clusterprofiler • 1.8k views

ADD COMMENT • link updated 3.9 years ago by Issac ▴ 40 • written 3.9 years ago by Ruixuan • 0

0

Entering edit mode

Cross-posted: https://support.bioconductor.org/p/131653/

ADD REPLY • link 3.9 years ago by Kevin Blighe 87k

score 1 · Accepted Answer · 2020-06-11

1

Entering edit mode

3.9 years ago

Issac ▴ 40

How many DEGs from your comparison between group1 vs group4 overlap DEGs from other comparisons? Are they similar with each other? Maybe you could set up a higher cutoff so that you are able to check whether you could get some results, like pvalue cutoff = 0.5 and qvaluecutoff = 0.5. Sometimes, it is true you can't get any significant GO terms with a specific gene set.

ADD COMMENT • link 3.9 years ago by Issac ▴ 40

0

Entering edit mode

Thanks for your comment. It is around 3400 genes are DEGs. 1700 are overrepresented and others are suppressed. Well, if I changed the cutoff here, I think I should also need to check that of other comparisons right? like group1 vs group2 and group1 vs group3. Cutoff should be consistent in all comparison?

ADD REPLY • link 3.9 years ago by Ruixuan • 0

0

Entering edit mode

I think it depends on your biological purpose. I don't know what kind of question you would like to answer for your research. In your case, I think it's better to set up a bit of higher cutoff so that you can obtain an intial impression on your data even they are not significant strictly. Alternatively, you can do GSEA analysis. You may get something else.