Dear all,
A rather broad question:
- I have a case/control WGS data set
- The case phenotype is caused by mutations in a few known genes but there is are many more to discover!
I have run:
- RVtests/SAIGE-GENE to look at rare SNPs
- Plink to look at common variation by doing a GWAS with PCA derived covariates.
The cases carry a renal phenotype - I now have access to the age at which they developed end stage renal disease (ESRD) as well the stage of chronic kidney disease they have (CKD stage 1-5) for those who have not reached ESRD
I have binned my cases into "types" of mutation: 1. Truncating mutation in a known causative gene, 2. Missense mutation in a known causative gene, 3. Truncating mutation in a likely causative gene, 4. Other type of mutation in a causative gene, 5. no mutation detected.
My question is how one could leverage this phenotypic data to deepen the analysis, I am sure there are genetic modifiers in this disease.
Thoughts are:
- Use age of ESRD or CKD stage as the variable and perform rare and common variation analysis in the cases only e.g. rather than giving 1/0 as case/control to PLINK or SAIGE - give a continuous age of ESRD of stage of CKD from 1-5.
Would that be appropriate? Are there more sophisticated methods here that could also use the control data? Can one leverage the "type" of mutation data in anyway (giving the arbitrary numbers doesn't see to be the way forward here...)
Many thanks for your thoughts