Question: run gene enrichment analysis using metatranscriptomic data
gravatar for flying dutchman
2.2 years ago by
flying dutchman0 wrote:

Hi All,

I have metatranscriptomic data (measured from feces sample) and want to perform gene enrichment analysis. The package I am using is clusterProfiler.

I am stuck by some questions for many days. I searched the google but haven't found the answer.

  1. After performing denovo assembly with Trinity, I got around 850,000 unigene with ID like "TRINITY_DN542091_c0_g2". Then I mapped these Trinity ID unigenes to ncbi-protein id. I got around 800,000 ncbi-protein ids. Then I use "mygene" package to convert the ncbi-protein id to entrez gene ids, I only got around 20,000. The result is though there are 3,000 differential expressed genes, only 60 of them have entrez id to perform enrichment analysis. Why only such few ncbi-protein ids were converted?

  2. When I was using these converted entrez gene ids to perform gene enrichment analysis using "clusterProfiler" package, I already input entrez id as characters, but it still said " Expected input gene ID: 284541,5213,29925,25796,3938,10449"


[1] "5328557" "851620" "31798232" "856371" "854405" "854229"

ekk <- enrichKEGG(gene=geneList,organism = "hsa",pAdjustMethod = "BH",pvalueCutoff=0.01)

--> No gene can be mapped....

--> Expected input gene ID: 284541,5213,29925,25796,3938,10449

--> return NULL...

  1. I also tried to run enrichment analysis using ncbi-proteinid, it also said "Expected input gene ID: NP_002617,NP_000498,NP_787082,NP_061948,NP_055056,NP_002617". Several of my protein ids are "NP_xxxxxx", most are not (like "CBK82693.1", "WP_026649001.1").


[1] "CBK82693.1" "CBL23100.1" "WP_025579028.1" "WP_022786881.1" "WP_026649001.1" "CDA70808.1"

ekk <- enrichKEGG(gene=prolist,organism = "hsa",keyType = "ncbi-proteinid", pAdjustMethod = "BH",pvalueCutoff=0.01)

No gene can be mapped....

Expected input gene ID: NP_002617,NP_000498,NP_787082,NP_061948,NP_055056,NP_002617

return NULL...

  1. As metatranscriptomic data is from a micro-environment (feces) rather than a model organism, so which OrgDb should I choose?


R gene • 1.1k views
ADD COMMENTlink modified 19 months ago by pedrorodrigues10 • written 2.2 years ago by flying dutchman0
gravatar for cvu
24 months ago by
cvu150 wrote:

Did you resolve this problem? which OrgDb can be used?

ADD COMMENTlink written 24 months ago by cvu150
gravatar for pedrorodrigues
19 months ago by
pedrorodrigues10 wrote:

@flying dutchman I am also facing similar issues when working with GO terms in a metatranscriptome. One of the things I have been wondering is if it makes sense to look for gene set enrichment when working with genes from many different organisms. Are there tools that account for community-level biases when doing gsea? I am working with metatranscriptomes from microorganisms found in insect guts. Please let me know if you have found a solution for your question.

If anyone else in the community can give us input on these questions, please let us know.

ADD COMMENTlink written 19 months ago by pedrorodrigues10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 805 users visited in the last hour