Question: run gene enrichment analysis using metatranscriptomic data
0
gravatar for flying dutchman
16 months ago by
flying dutchman0 wrote:

Hi All,

I have metatranscriptomic data (measured from feces sample) and want to perform gene enrichment analysis. The package I am using is clusterProfiler.

I am stuck by some questions for many days. I searched the google but haven't found the answer.

  1. After performing denovo assembly with Trinity, I got around 850,000 unigene with ID like "TRINITY_DN542091_c0_g2". Then I mapped these Trinity ID unigenes to ncbi-protein id. I got around 800,000 ncbi-protein ids. Then I use "mygene" package to convert the ncbi-protein id to entrez gene ids, I only got around 20,000. The result is though there are 3,000 differential expressed genes, only 60 of them have entrez id to perform enrichment analysis. Why only such few ncbi-protein ids were converted?

  2. When I was using these converted entrez gene ids to perform gene enrichment analysis using "clusterProfiler" package, I already input entrez id as characters, but it still said " Expected input gene ID: 284541,5213,29925,25796,3938,10449"

head(geneList)

[1] "5328557" "851620" "31798232" "856371" "854405" "854229"

ekk <- enrichKEGG(gene=geneList,organism = "hsa",pAdjustMethod = "BH",pvalueCutoff=0.01)

--> No gene can be mapped....

--> Expected input gene ID: 284541,5213,29925,25796,3938,10449

--> return NULL...

  1. I also tried to run enrichment analysis using ncbi-proteinid, it also said "Expected input gene ID: NP_002617,NP_000498,NP_787082,NP_061948,NP_055056,NP_002617". Several of my protein ids are "NP_xxxxxx", most are not (like "CBK82693.1", "WP_026649001.1").

head(prolist)

[1] "CBK82693.1" "CBL23100.1" "WP_025579028.1" "WP_022786881.1" "WP_026649001.1" "CDA70808.1"

ekk <- enrichKEGG(gene=prolist,organism = "hsa",keyType = "ncbi-proteinid", pAdjustMethod = "BH",pvalueCutoff=0.01)

No gene can be mapped....

Expected input gene ID: NP_002617,NP_000498,NP_787082,NP_061948,NP_055056,NP_002617

return NULL...

  1. As metatranscriptomic data is from a micro-environment (feces) rather than a model organism, so which OrgDb should I choose?

Thanks

R gene • 841 views
ADD COMMENTlink modified 9 months ago by pedrorodrigues10 • written 16 months ago by flying dutchman0
0
gravatar for cvu
14 months ago by
cvu130
India
cvu130 wrote:

Did you resolve this problem? which OrgDb can be used?

ADD COMMENTlink written 14 months ago by cvu130
0
gravatar for pedrorodrigues
9 months ago by
pedrorodrigues10 wrote:

@flying dutchman I am also facing similar issues when working with GO terms in a metatranscriptome. One of the things I have been wondering is if it makes sense to look for gene set enrichment when working with genes from many different organisms. Are there tools that account for community-level biases when doing gsea? I am working with metatranscriptomes from microorganisms found in insect guts. Please let me know if you have found a solution for your question.

If anyone else in the community can give us input on these questions, please let us know.

ADD COMMENTlink written 9 months ago by pedrorodrigues10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1191 users visited in the last hour