run gene enrichment analysis using metatranscriptomic data
6.6 years ago

Hi All,

I have metatranscriptomic data (measured from feces sample) and want to perform gene enrichment analysis. The package I am using is clusterProfiler.

I am stuck by some questions for many days. I searched the google but haven't found the answer.

  1. After performing denovo assembly with Trinity, I got around 850,000 unigene with ID like "TRINITY_DN542091_c0_g2". Then I mapped these Trinity ID unigenes to ncbi-protein id. I got around 800,000 ncbi-protein ids. Then I use "mygene" package to convert the ncbi-protein id to entrez gene ids, I only got around 20,000. The result is though there are 3,000 differential expressed genes, only 60 of them have entrez id to perform enrichment analysis. Why only such few ncbi-protein ids were converted?

  2. When I was using these converted entrez gene ids to perform gene enrichment analysis using "clusterProfiler" package, I already input entrez id as characters, but it still said " Expected input gene ID: 284541,5213,29925,25796,3938,10449"


[1] "5328557" "851620" "31798232" "856371" "854405" "854229"

ekk <- enrichKEGG(gene=geneList,organism = "hsa",pAdjustMethod = "BH",pvalueCutoff=0.01)

--> No gene can be mapped....

--> Expected input gene ID: 284541,5213,29925,25796,3938,10449

--> return NULL...

  1. I also tried to run enrichment analysis using ncbi-proteinid, it also said "Expected input gene ID: NP_002617,NP_000498,NP_787082,NP_061948,NP_055056,NP_002617". Several of my protein ids are "NP_xxxxxx", most are not (like "CBK82693.1", "WP_026649001.1").


[1] "CBK82693.1" "CBL23100.1" "WP_025579028.1" "WP_022786881.1" "WP_026649001.1" "CDA70808.1"

ekk <- enrichKEGG(gene=prolist,organism = "hsa",keyType = "ncbi-proteinid", pAdjustMethod = "BH",pvalueCutoff=0.01)

No gene can be mapped....

Expected input gene ID: NP_002617,NP_000498,NP_787082,NP_061948,NP_055056,NP_002617

return NULL...

  1. As metatranscriptomic data is from a micro-environment (feces) rather than a model organism, so which OrgDb should I choose?


6.4 years ago
cvu ▴ 180

Did you resolve this problem? which OrgDb can be used?

6.0 years ago

@flying dutchman I am also facing similar issues when working with GO terms in a metatranscriptome. One of the things I have been wondering is if it makes sense to look for gene set enrichment when working with genes from many different organisms. Are there tools that account for community-level biases when doing gsea? I am working with metatranscriptomes from microorganisms found in insect guts. Please let me know if you have found a solution for your question.

If anyone else in the community can give us input on these questions, please let us know.


