Hi everyone, I'm doing a GO analysis after finish the statistical test by edgeR.
Before, I did the comparison between
Here the problem came when I compared
group4, there are 1740 genes showing to be significantly overrepresented in group 4.
However, when I used the code below
enrich.go.BP = enrichGO(gene = up_gene.4vs1$GeneID, OrgDb = Acan.OrgDb, keyType = "ENTREZID", ont = "BP", pvalueCutoff = 0.01, qvalueCutoff = 0.05, readable = T)
There is no enriched terms in the result.
This code worked well when I compared other groups to group1, so I think there may be no problem on code. Thus, I'm wondering why I got this result? How can I fix it? Is it that I got too many genes which locate in almost all kinds of category so that there is no statistical significant enriched terms?
Thank you in advance.
For more information on
Acan.OrgDb is the one I loaded by using Annotationhub, because my target species "acanthamoeba castellanii" is not a model organism.
hub <- AnnotationHub::AnnotationHub() amoeba <- query(hub, "Acanthamoeba castellanii") # title # AH73987 | Transcript information for Acanthamoeba castellanii str Neff # AH73987 | Transcript information for Acanthamoeba castellanii str Neff # AH74626 | Transcript information for Acanthamoeba castellanii str Neff # AH81410 | org.Acanthamoeba_castellanii_Neff_strain.eg.sqlite # AH81411 | org.Acanthamoeba_castellanii_str._Neff.eg.sqlite # AH81412 | org.Acanthamoeba_castellanii_strain_Neff.eg.sqlite
Here I chose the
AH81410 because its Db type is
Acan.OrgDb <- hub[["AH81410"]] > Acan.OrgDb OrgDb object: | DBSCHEMAVERSION: 2.1 | DBSCHEMA: NOSCHEMA_DB | ORGANISM: Acanthamoeba castellanii_Neff_strain | SPECIES: Acanthamoeba castellanii_Neff_strain | CENTRALID: GID | Taxonomy ID: 1257118 | Db type: OrgDb | Supporting package: AnnotationDbi
colnames(Acan.OrgDb), we could see that it supported
> columns(Acan.OrgDb)  "ACCNUM" "ALIAS" "CHR" "ENTREZID" "EVIDENCE" "EVIDENCEALL" "GENENAME" "GID" "GO" "GOALL"  "ONTOLOGY" "ONTOLOGYALL" "PMID" "REFSEQ" "SYMBOL"
Then, I prepared my significant genes list into
ENTREZID format. The format is generated by combining ORFID, locus_tag and annotation from files downloaded from NCBI.
GeneID is recording those id in
>up_gene.4vs1 Locus_tag ORFID Name Accession Start Stop Strand GeneID Locus Protein_product Length Protein_Name 1 ACA1_000790 gene5490 Un NW_004457578.1 5136 5699 + 14921342 NA XP_004343320.1 187 hypothetical protein ACA1_000790 2 ACA1_001250 gene2057 Un NW_004457658.1 4004 11317 + 14924768 NA XP_004353303.1 1925 hypothetical protein ACA1_001250 3 ACA1_001280 gene2060 Un NW_004457658.1 17392 18733 - 14924773 NA XP_004353305.1 258 hypothetical protein ACA1_001280 4 ACA1_001300 gene2062 Un NW_004457658.1 20701 23681 - 14924770 NA XP_004353306.1 599 fucose1-phosphate guanylyltransferase
You may also notice that there are hypothetical proteins which could blur the prediction. Although there are 691 entries of hypothetical protein, there are still (1049/1740) entries left.
Thus, I'm a little bit confused about the results from enrichGO showing no enriched GO terms.
Could you give me some advices? Thank you in advance.