Very soon we will have DNA on a very large cohort (more than 30K IDs). We would like to do array based genotyping on this big cohort.
The problem is, that we have no idea of the quality of this cohort. And we do not want to run 1000s of arrays in a dataset which might have mix up of samples.
How do we make quality control of a study sample in a cost-effective way?
I was thinking of genotyping a few high frequent X-chromosome SNPs and do sex-check in PLINK.
Is it possible to make power calculation (and how) to determine how many SNPs I should genotype to be able to determine the genotype-sex properly.
Do you have any other ideas of how to approach this problem.
Any comments are very welcome