Hello everyone,
I am looking for conceptual help with a grad school genomic analysis project. Instructions:
Select one or more papers, articulate a new hypothesis, and use the dataset(s) to support (or refute) that hypothesis.
Perform at least two analyses/visualizations on two different data models (RNA-seq, microRNA, copy number, ChIP-seq, etc), or apply at least two different methods to test your hypothesis.
Integrate the different analyses to enhance your conclusion(such as combining microRNA and mRNA data or using pathway analysis/visualization algorithms to confirm functional similarity).
I had planned to take RNA-seq data from a mouse study, and extrapolate the impact of intermittent fasting-induced differential gene expression to cancer survival in humans. I knew would be stretch and more of a fun, speculative approach. The plan was to get the mouse RNA-seq data, perform differential gene expression, convert the mouse gene ID to human gene ID, as to pretend that this was a human study from the beginning. From here I would perform gene set enrichment analysis to find enriched biological pathways. If I were to find any, I would then perform survival analysis with cancer patient mRNA expression data, to find any associations with intermittent fasting up/downregulated genes form the identified enriched gene sets.
I presented my project idea to a professor and he was not the biggest fan. Although the biological implication might be a far stretch, I assumed that it would be an acceptable since I THINK it follows the instructions. I am hoping to get some suggestions for how I could either improve my approach, or a general framework of analysis that would follow the instruction provided, to then look for a paper(s) with data sets that I could use. For my personal interests, I would like to focus on the metabolic side of human longevity/age-related diseases, I just lack the knowledge of how to do so, with these instructions.
link of paper I initially wanted to use. - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6764061/pdf/10.1177_1559325819876780.pdf
I appreciate any help, thanks in advance!
From what I understood is that this project is just to show our knowledge of data analysis in R. He wasn't very clear, we both were in a rush today, he just we implied that he did not like it because the connection of GSEA results to patient cancer data would be weak. Which i agree its a leap, but perhaps not too farfetched, unless there is something very obvious that I am missing.
Oh i agree that it might be overkill, but everything that I have explained mirrors was we did in class or what I have worked on for my master project, so most of the coding is done. The conversion to human gene ID is to show technical ability in R, and to pretend that the originally study was done in humans, so I could interrogate patient gene expression as well.
Yes I agree, the GSEA would be the exploratory data used to formulate my hypothesis. As much as I would like to go through the literature, this assignment doesnt really warrant that amount of effort! I assumed what i purposed would suffice in showing my skills while still having an interesting presentation. I am just curious if there is something blatantly obvious wrong in my thought process, that I am missing.
Thank you for the response!