Hello Comunity, Using DeSeq2, I have obtained the list of genes which are differentially expressed across my samples. This analysis finished with 2 files: (1) ent_gene (or gene_1) and (2) ent_uni (or uni_1). The first one corresponds with the list of genes diff. expressed, and the second one is with the full list of genes (removing N/A). My transcriptome and gene information is from NCBI. I am trying to use ClusterProfiler. To start, my data are not coming from esemble and the species that I am studying is the horse. I read here that you need to generate your own OrgDb using
>AnnotationHub.library(AnnotationHub) >hub = AnnotationHub() >query(hub, c("equus caballus", "orgdb")) >eqCab = hub[["AH94124"]] >eqCab
OrgDb object: | DBSCHEMAVERSION: 2.1 | DBSCHEMA: NOSCHEMA_DB | ORGANISM: Equus caballus | SPECIES: Equus caballus | CENTRALID: GID | Taxonomy ID: 9796 | Db type: OrgDb | Supporting package: AnnotationDbi
Then I have tried to run enrichGo:
ego = enrichGO(gene = ent_gene, OrgDb = eqCab, keyType = 'SYMBOL', ont = "BP", universe = ent_uni)
My gene ID corresponds with the SYMBOLs. My FIRST issue comes here, my output only has 8 Description biological processes. I know that some of the genes sigf. expressed have got other biological functions that is not represented here. I am wondering if the issue is coming from my script, but my current knowledge does not allow me to see where.
ID Description GeneRatio BgRatio pvalue GO:0007399 GO:0007399 nervous system development 10/20 347/3657 4.046320e-06 GO:0007605 GO:0007605 sensory perception of sound 3/20 17/3657 9.063503e-05 GO:0048699 GO:0048699 generation of neurons 7/20 217/3657 9.382188e-05 GO:0050954 GO:0050954 sensory perception of mechanical stimulus 3/20 18/3657 1.083826e-04 GO:0007600 GO:0007600 sensory perception 4/20 50/3657 1.275744e-04 GO:0022008 GO:0022008 neurogenesis 7/20 235/3657 1.557383e-04 GO:0030182 GO:0030182 neuron differentiation 6/20 189/3657 3.712655e-04 GO:1901888 GO:1901888 regulation of cell junction assembly 3/20 32/3657 6.273457e-04 p.adjust qvalue geneID Count GO:0007399 0.002561321 0.002427792 APOD/ATP2B2/COL2A1/EFNB2/GPM6B/LAMC2/LRRN3/MME/NTRK2/POU3F4 10 GO:0007605 0.016150923 0.015308932 ATP2B2/COL2A1/POU3F4 3 GO:0048699 0.016150923 0.015308932 APOD/ATP2B2/EFNB2/GPM6B/LAMC2/MME/POU3F4 7 GO:0050954 0.016150923 0.015308932 ATP2B2/COL2A1/POU3F4 3 GO:0007600 0.016150923 0.015308932 ATP2B2/COL2A1/MME/POU3F4 4 GO:0022008 0.016430390 0.015573830 APOD/ATP2B2/EFNB2/GPM6B/LAMC2/MME/POU3F4 7 GO:0030182 0.033573006 0.031822754 APOD/ATP2B2/EFNB2/GPM6B/LAMC2/POU3F4 6 GO:1901888 0.049638726 0.047050925 APOD/GPM6B/LRRN3 3
I must say that I got 117 genes diff. expressed. If I do not misunderstand this, the third column is telling how many genes of my pull of genes participate in that biological process. I do not understand how is possible that I have got x/20 when I had 117. Why is enrichGO removing the rest of my genes from the analysis??
Then, I have tried to run enrichkegg to get the list of genes participating in the same biological process. I run the previous command, changing the sub and it gave me the next error:
ekg = enrichKEGG(gene = ent_gene, organism = 'ecb', keyType ='ncbi-geneid', universe = ent_uni)
Ouput: --> No gene can be mapped.... --> Expected input gene ID: 100147313,100057998,100072967,100056690,100065115,100055111 --> return NULL...
That makes sense because I select ncbi-geneid like KeyType and I have been using symbols. So I generated my gene and universe file with ncbi-geneid. NCBI gives you the ncbi-geneid with prex: "GENEID: 10478648", so I had to remove it from my vector using (str_remove_all).
I tried to run enrichkegg with my new vector of genes:
library(KEGGREST) ekg = enrichKEGG(gene = gene_1, organism = 'ecb', keyType ='ncbi-geneid', universe = uni_1)
However, my output is completely empty:
> ekg # # over-representation test # #...@organism ecb #...@ontology KEGG #...@keytype ncbi-geneid #...@gene chr [1:112] "100034030" "100034058" "100034081" "100050122" "100050231" "100050257" "100050640" ... #...pvalues adjusted by 'BH' with cutoff <0.05 #...0 enriched terms found #...Citation Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology 2012, 16(5):284-287 > summary(ekg)  ID Description GeneRatio BgRatio pvalue p.adjust qvalue geneID  Count <0 rows> (or 0-length row.names) Warning message: In summary(ekg) : summary method to convert the object to data.frame is deprecated, please use as.data.frame instead.
I am sure that I am doing something wrong, but I do not know what is it. Could please anybody help me?