Question: Duplicated genes in overrepresentation analysis (ORA)
gravatar for nickhir
5 weeks ago by
nickhir30 wrote:


I have different gene lists from different biopsies. In each gene list, all genes are listed which carry a mutation (identified by WES). I want to run a overrepresentation analysis, to check if certain pathways are more hit by mutations than others. However, for many gene lists there is the same gene listed multiple times, because there happened to be more than one mutation in that gene. By default the ORA tool I am using (webgestalt) removes duplicated genes, however, I think in my case it might be useful to keep them.

I am very new to ORA/GSEA, so I am not sure if this makes sense. Furthermore, I did not find a tool yet, which allows me to keep duplicated genes. If somebody could tell me if my idea makes sense and also a tool or a way how I can analyze the gene lists, I would be very happy!


gsea ora • 101 views
ADD COMMENTlink modified 5 weeks ago by kelen160 • written 5 weeks ago by nickhir30
gravatar for kelen
5 weeks ago by
London, UK
kelen160 wrote:

If I understand you correctly you want an ORA tool that would accept a list of genes that has duplicated genes? In that case it might be easier to just input a gene list that only has unique values and not guess if the tool does any filtering. You can filter your gene list to only report back unique values in either bash, R, python, even excel.

ADD COMMENTlink written 5 weeks ago by kelen160

I think you missunderstood me. I know that the tool I am using removes duplicated genes. However, I am not sure if this is really the best practice for my particular type of analysis. Specifically I would like a tool that includes duplicates in its analysis.

ADD REPLYlink written 5 weeks ago by nickhir30

This is just coming from what I know about ORA as I thought this is what you were asking, if you are instead looking for something WES and mutation specific then there might be other approaches. For best practices in checking for enrichment through ORA in a simple gene list would require using unique genes in the input (only including the duplicated genes once). As far as I know there is no tool in these types of EAs that would benefit from 'including' duplicates (doesn't really make sense either), which is why the duplicates get removed or it gives you a warning. That said, if you think the number of mutations (leading to the duplication of gene names in your input) is important and biologically meaningful, you could try ranking your gene list by the number of mutations and use tools like gprofiler2 that can test a simple ranked gene list.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by kelen160

Aright, thank you very much! I plan on comparing biopsy A vs biopsy A+biopsy B to figure out if some GO terms/pathways are significantly more mutated in biopsy A. Do you think this comparison, i.e. the choice of gene list and background makes sense?

ADD REPLYlink written 5 weeks ago by nickhir30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 993 users visited in the last hour