Too many Terms after Enrichment analysis
0
0
Entering edit mode
3.2 years ago
biosjm • 0

Hi all, I obtained after DEG analysis with deseq2 and Go enrichment analysis (fisher,classic) with TopGo a long list of enriched terms. They are from several levels and too many (500) to explain all of them. I´m not sure how I should proceed... Is there a slimfunction I can use after using TopGO?

GO enrichment analysis Slim • 2.0k views
ADD COMMENT
0
Entering edit mode

There are two ways of reducing candidates. The first is to use the lfc argument in DESeq2::results to test against a certain threshold so only genes above this have the chance to be significant. With this you can exclude significant genes with tiny effect sizes. The second one would be to use something more meaningful or specific as GO terms such as KEGG or REACTOME pathways. GO terms are broad and unspecific. Something lime "signaling" in a GO term is very unspecific, and for more precise results you could use tools like gprofiler2 to actually test for pathways rather than just terms.

ADD REPLY
0
Entering edit mode

GO terms are broad and unspecific.

This depends on the depth of the branch. Some GO terms can be very specific. Working with GO one would ideally need to take into account the graph structure of the ontology. Pathways are easier to work with because they are defined as lists of genes. The downside is that pathways with the same name are defined differently in different resources (e.g. KEGG dna damage has 124 genes, Reactome dna damage has 314 proteins). Also even Reactome pathways are connected in a graph so you also need to decide on which level you want to operate.

ADD REPLY
0
Entering edit mode

Thanks, very helpful! Can you recommend a package I can use for KEGG analysis/enrichment? I used TopGO for GO enrichment analysis, but could not find something similar for KEGG. I have already a custom annotation table (de novo transcriptome assembly- containing among others informations of KEGG/KO ID´s for every gene).

ADD REPLY
0
Entering edit mode

Have a look at the clusterProfiler Bioconductor package but you can also easily compute statistics yourself (e.g. overrepresentation with the hypergeometric test or gene set enrichment analysis with the fgsea package).
EDIT: Here is a tutorial that you may find helpful.

ADD REPLY
0
Entering edit mode

Yes, the problem with clusterprofiler is, that you have to use search_kegg_organism, so you have to search for a organism. Due to the fact, that I have a assembly, I have KEGG numbers from several organism.

ADD REPLY
0
Entering edit mode

I think you can use your own annotations with clusterProfiler. See chapter 3 of the clusterProfiler manual.

ADD REPLY
0
Entering edit mode

Thank you for your answer, ATpoint. Yes, I have set already the threshold for log2foldchange to 1. I have to say, that I have a non model organism and have a custom annotation table containing also KEGG and KO Ids. I made the experience, that many tools/packages are only available for model organism.

ADD REPLY
0
Entering edit mode

Please add comments via ADD REPLY/COMMENT.

Maybe you can use the homolog gene names from a closely-related model organism for the pathway analysis?

ADD REPLY
0
Entering edit mode

tools/packages are only available for model organism.

Some tools allow you to use custom annotation files if you have them (see for example the topGO documentation section on custom annotations). A common approach to generate custom annotations is to transfer annotations by orthology from related organisms.

ADD REPLY
0
Entering edit mode

Also, if you are specifically interested in GO terms and have too many, there are some tools which will reduce and summarize redundant lists of the terms, such as REViGO.

ADD REPLY

Login before adding your answer.

Traffic: 1310 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6