Question

Rare Disease Variant Pathway Analysis

0

Entering edit mode

6 months ago

The_PyPanda ▴ 10

Hello, apologies if this has been asked before. But most answers/tools I seem to find require some form of expression data to do pathway analysis.

My situation is that I have set long read whole genome sequenced samples and some normal tissue samples. I called and filtered variants from these. These are gene variants so essentially I have a list of genes e.g., TP53, EGFR etc. That varies between the disease vs normal sample.

I understand there is no hard and fast rules. But how can I perform pathway analysis on these datasets. If I for example use Reactome I end up with >1000 pathways and 100's hits on GO ontology. Any advice would be much appreciated.

Pathway-analysis • 478 views

ADD COMMENT • link updated 6 months ago by Ram 44k • written 6 months ago by The_PyPanda ▴ 10

score 0 · Answer 1 · 2024-04-25

Hi The_PyPanda ,

First a caveat. the information we most need in order to help guide you to a successful conclusion is not provided in this post. We don't know, for instance, if you have prioritized these variants, how you filtered them, what the criteria for doing so were if you did, and so forth. Without this information, we cannot really know if you will obtain reasonable results - we can only assume / hope these steps have been solid. But if they haven't the advice we give will be at best irrelevant, at worst lead you down a bad road. In addition, there may be a magic bullet contained in that info .. for example, we might be able to easily figure out why you are generating so many hits and then help you re-assess the approach.

Finally, you say expression data is needed, but then you tell us you have generated DE pathways (somehow) .. how did you do this? The answer to that question may well be the magic bullet described above.

OK, caveats aside, let's assume you are good to go. I have a few distinct suggestions here:

1. Consider approaches to manage the pathway results obtained.

I dont doubt that you've come up with many pathways and hits, but most of those will be marginal associations, and most of those pathways will relate to one another. So, it is worth asking the question, "how many coherent and non-redundant gene programs do these 1000+ pathways represent"? To do this, you could try to:

before conducting pathway analysis, generate pathway similarity scores on the front end, then remove redundant pathways by eliminating one of each pair that has a similarity score > 0.9 or some threshold (there are hundreds of pathways that barely differ from one another)
after conducting pathway analysis, trim the number of pathways to only those that remain significant after family-wise error control.
using your own biological knowledge, manually curate your own pathway by including genes that past a defined criteria you have set, then test these pathways (if you dont trust the huge numbers of "significant results")

2. Consider approaches in the published literature

Many manuscripts have dealt with this problem, yet there is little agreement on how best to do it. Though not a rare disease (RA has a prevalance of ~1%), you could, for example, consider elements of the approach taken in this manuscript. As you've said, there are no hard and fast rules - these authors used an ensemble of many methods, then ordered variants by how many of the methods identify the gene/variant.

3. Use the published literature to reduce search space and decide which variants are most plausible

Because we dont know what the disease state is, I dont know off hand how rare this disease is or how well the pathophysiology is understood. Reading the published literature can help you decide

if you think the genes nominated by your filtering process are reasonable
if you think the pathways nominated by the gene results are reasonable

4. Adapt a framework used for a related goal

You could consider the type of logic used, for example, to develop and justify gene burden testing, then adapt it to your question