Question

Functional enrichment of pathways

0

Entering edit mode

6.0 years ago

Joe Kherery ▴ 120

Hello everyone,

I have a list of differentially expressed genes, on my list I have several genes and some LOC and LINC. I intend to do functional enrichment with this list, it is prudent to keep the LOC and LINC? I noticed that if I remove these, I get more information of enriched pathways. Is it correct to remove them? or was it tending to my analysis?

Regards

enriched pathways • 1.7k views

ADD COMMENT • link 6.0 years ago by Joe Kherery ▴ 120

2

Entering edit mode

I'd normally remove any gene that has no pathway annotations at all before running functional similarity analysis

ADD REPLY • link 6.0 years ago by russhh 5.7k

1

Entering edit mode

Presented like this, this seems the wrong thing to do. If this is OK, then why not remove genes with annotations you don't like ? I hope that at least this is mentioned when you report the results because on the face of it, this is fishing for significance. Removing genes from a list has to be well motivated and taken into consideration when interpreting the results of the analysis.

You may be interested in reading these papers:
- Multiple sources of bias confound functional enrichment analysis of global -omics data
- Annotation Enrichment Analysis: An Alternative Method for Evaluating the Functional Properties of Gene Sets
- Using predictive specificity to determine when gene set analysis is biologically meaningful.

ADD REPLY • link 6.0 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Dear Jean-Karim Heriche,

I was a bit confused now, I can not remove LOC, LINC and MIR from my list? to make functional enrichment?

Since they do not have "GO functions". And keeping them can cause some canonical pathways not to have a significant p-value.

ADD REPLY • link 6.0 years ago by Joe Kherery ▴ 120

0

Entering edit mode

I do remove genes with annotations I don't like, from GO at least. If the only evidence code is IEA or IEP or similar. Ditched. I'm afraid I don't agree with you on this issue and I don't see it as p-hacking to remove genes that are unannotated across all genesets, although I can see how this may reduce hypergeometric p-values (if not GSEA). Provided you are transparent about the source of your annotations, the filtering of your genesets etc your functional similarity mining is perfectly defensible. However, it's very rare that these approaches provide any notable biological insight into a project

ADD REPLY • link 6.0 years ago by russhh 5.7k

0

Entering edit mode

Dear, russhh

Do you do it manually? one by one in Uniprot ?

ADD REPLY • link 6.0 years ago by Joe Kherery ▴ 120

0

Entering edit mode

No. I use geneset definitions from GO or reactome programmatically. How are you performing your GSEA or Fisher tests?

ADD REPLY • link 6.0 years ago by russhh 5.7k

0

Entering edit mode

I use MSigDB or enrich via web, sometimes I use panther db too.

Can you give me an example of how to filter my list of genes?

ADD REPLY • link 6.0 years ago by Joe Kherery ▴ 120

0

Entering edit mode

what language do you use?

ADD REPLY • link 6.0 years ago by russhh 5.7k

0

Entering edit mode

Dear russhh, I only use R.

ADD REPLY • link 6.0 years ago by Joe Kherery ▴ 120

0

Entering edit mode

The approach depends on the data-structure used.

The gene-sets I use for fgsea come from reactome; Suppose the genes in my experiment are stored in the vector my_genes (as entrez ids). I'd obtain reactome annotations using genesets <- fgsea::reactomePathways(my_genes). genesets is a list of vectors of entrez ids.

Then I'd obtain the set of reactome-annotated genes: universe <- purrr::reduce(genesets, union) (note that by construction, universe is a subset of my_genes).

Then I'd subset my experimental data so that I only consider those genes that have at least one annotation and are present on my experiment (this depends how you've stored your experimental data, but can typically be done using my_query %in% universe type syntax) .

I can't really help any further without some sense of how your dataset is organised.

ADD REPLY • link 6.0 years ago by russhh 5.7k