I'm using GCTA GREML to estimate heritabilities in my sample. I will also be using Bivariate GREML for genetic correlations. I have a few questions that I'm struggling to find answers to - appreciate any input:
1) Should I remove MHC/other high LD regions?
2) Any advice on using pruned data for computational speed? According to these forum posts it isn't really necessary/recommended but shouldn't make much of a difference to results - bit unclear
(I https://gcta.freeforums.net/thread/281/ld-pruning-heritability-estimation ; II https://gcta.freeforums.net/thread/109/ld-heritability-estimate ; III https://gcta.freeforums.net/thread/44/sample-size-thin-snps-first)
3) Studies using GCTA-GREML tend to use imputed data to create the GRM but I can’t find an explanation as to why, as I’d assumed the relatedness would remain similar. The last two of the following posts report higher h2 with genotyped than imputed data; is it worth looking at estimates using each and e.g. reporting that with the lower SE (and the other in suppl mat)?
(I https://gcta.freeforums.net/thread/3/estimate-grm-genotyped-imputed-data ; II https://gcta.freeforums.net/thread/133/imputed-data-input ; III https://gcta.freeforums.net/thread/379/reml-results-genotyped-imputed-data ; IV https://gcta.freeforums.net/thread/8/more-snps-explain-variance )