Question

Integrating gene expression (RNA-Seq), mutations and patient survival data

0

Entering edit mode

5.9 years ago

praveen.cbt • 0

Hello all,

My objective is to analyze RNA sequencing data for leukemia patients in TCGA database that I want to combine to available clinical ( overall survival or disease free survival data) and mutations data. Number of these leukemia patients have DNMT3A mutations. We want to evaluate whether DNMT3A mutations or lack of it affects the survival outcome. Our preliminary experimental data indicate that gene X interacts with DNMT3A to regulate its function. To test if we divide patients further in group1(DNMT3A mutated + Gene X High expression) vs group2(DNMT3A mutated + Gene X Low expression) and similarly group3( DNMT3A WT + Gene X High) vs group4(DNMT3A WT + Gene X Low) will it impact patient's overall survival or DFS.

I have come across few workflows that integrate gene expression and patients clinical data but I'm unsure how one would do such analysis if another variable (mutations) is introduced. Please help me if you know how it can be done.

Thanks PK

RNA-Seq R • 1.2k views

ADD COMMENT • link updated 5.9 years ago by Kevin Blighe 89k • written 5.9 years ago by praveen.cbt • 0

0

Entering edit mode

I'm assuming you've seen this paper that just came out this week (and are doing a follow-up on DNMT3A?). Have you considered exploring what they did? It's similar to what you want to do.

ADD REPLY • link 5.9 years ago by Brice Sarver ★ 3.8k

score 0 · Answer 1 · 2019-07-31

There is no standard approach for this - that is why you have not come across workflows. At the risk of coming across as condescending, the best approach is to simply use your brain and devise an analysis plan. Think about what are your hypotheses that you want to prove / disprove. As a very simple example, it would be interesting to know how, for example, tumour mutation burden correlated with expression of known oncogenes, and how this relates to survival. Generally, segregate your cohort into different groups based on mutation and expression profiles, and then check the survival outcome of each group.

high mutation load + low TP53 = ? survival
high mutation load + high TP53 = ? survival

Integrative analysis pipelines are mostly a joke without any good planning involved.

Kevin