Question: Integrating gene expression (RNA-Seq), mutations and patient survival data
gravatar for praveen.cbt
12 months ago by
praveen.cbt0 wrote:

Hello all,

My objective is to analyze RNA sequencing data for leukemia patients in TCGA database that I want to combine to available clinical ( overall survival or disease free survival data) and mutations data. Number of these leukemia patients have DNMT3A mutations. We want to evaluate whether DNMT3A mutations or lack of it affects the survival outcome. Our preliminary experimental data indicate that gene X interacts with DNMT3A to regulate its function. To test if we divide patients further in group1(DNMT3A mutated + Gene X High expression) vs group2(DNMT3A mutated + Gene X Low expression) and similarly group3( DNMT3A WT + Gene X High) vs group4(DNMT3A WT + Gene X Low) will it impact patient's overall survival or DFS.

I have come across few workflows that integrate gene expression and patients clinical data but I'm unsure how one would do such analysis if another variable (mutations) is introduced. Please help me if you know how it can be done.

Thanks PK

rna-seq R • 380 views
ADD COMMENTlink modified 12 months ago by Kevin Blighe63k • written 12 months ago by praveen.cbt0

I'm assuming you've seen this paper that just came out this week (and are doing a follow-up on DNMT3A?). Have you considered exploring what they did? It's similar to what you want to do.

ADD REPLYlink modified 12 months ago • written 12 months ago by Brice Sarver3.5k
gravatar for Kevin Blighe
12 months ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

There is no standard approach for this - that is why you have not come across workflows. At the risk of coming across as condescending, the best approach is to simply use your brain and devise an analysis plan. Think about what are your hypotheses that you want to prove / disprove. As a very simple example, it would be interesting to know how, for example, tumour mutation burden correlated with expression of known oncogenes, and how this relates to survival. Generally, segregate your cohort into different groups based on mutation and expression profiles, and then check the survival outcome of each group.

  • high mutation load + low TP53 = ? survival
  • high mutation load + high TP53 = ? survival

Integrative analysis pipelines are mostly a joke without any good planning involved.


ADD COMMENTlink written 12 months ago by Kevin Blighe63k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1569 users visited in the last hour