Dear all,
I've got some data and no research question to answer, but some spare time and some students to explore it :-D It's in total 60 human individuals with a chronic disease (non genetic, non infectious, non cancer):
- 38 x whole genome (derived from blood) + 0 controls
- 25 x RNAseq (derived from affected tissue biopsy) + 7 healthy controls (of which only one has no matched proteome)
- 29 x shot-gun proteome (derived from affected tissue biopsy) + 16 healthy controls
14 patients have the complete set of all omics, some patients have only 2 matched omics, some only 1, some 0. Preprocessing and Quality control went fairly well, differential expression analysis were done on the expression sets separately. Additionally we've got an armada of clinical parameters for the patients.
So far I have considered to do eQTL and pQTL analysis and done some research on multi-omics integration and unsupervised disease subgroup detection, but so many tools were developed for or at least only tested on cancer data. Additionally, our data is now so fat and short (p>>>n), and although it's great to have it, analysis are likely to fail (?).
- Do you have any ideas, hints, links on fruitful analysis and make the best of it?
- Do you have a pessimistic/optimistic opinion, if the analysis of the data is meaningful at all?
- Do you have a strategic opinion regarding research on the set? (E.g. publish all results separately? Or in a whole? First in a data set journal, then results? ...)
I'd appreciate any hint, opinion, help, guidance :-) Milena
Am curious as to why this data was generated in first place? Has some other analysis been done on it to answer the original question (I assume the DE analysis may be part of it).
Since this is now a fishing expedition, I suppose you could try to see if you can find correlations between expressed component (RNAseq) and the proteome data. Latter is likely to be sparse so it may be a difficult challenge. You may as well focus on the 14 patients that have the complete datasets to reduce one variable of unmatched datasets.