I've got some data and no research question to answer, but some spare time and some students to explore it :-D It's in total 60 human individuals with a chronic disease (non genetic, non infectious, non cancer):
- 38 x whole genome (derived from blood) + 0 controls
- 25 x RNAseq (derived from affected tissue biopsy) + 7 healthy controls (of which only one has no matched proteome)
- 29 x shot-gun proteome (derived from affected tissue biopsy) + 16 healthy controls
14 patients have the complete set of all omics, some patients have only 2 matched omics, some only 1, some 0. Preprocessing and Quality control went fairly well, differential expression analysis were done on the expression sets separately. Additionally we've got an armada of clinical parameters for the patients.
So far I have considered to do eQTL and pQTL analysis and done some research on multi-omics integration and unsupervised disease subgroup detection, but so many tools were developed for or at least only tested on cancer data. Additionally, our data is now so fat and short (p>>>n), and although it's great to have it, analysis are likely to fail (?).
- Do you have any ideas, hints, links on fruitful analysis and make the best of it?
- Do you have a pessimistic/optimistic opinion, if the analysis of the data is meaningful at all?
- Do you have a strategic opinion regarding research on the set? (E.g. publish all results separately? Or in a whole? First in a data set journal, then results? ...)
I'd appreciate any hint, opinion, help, guidance :-) Milena