Question: Sequence of steps in quality control of genetic data
5 months ago by
kl


I am performing quality control on my genetic data. I have a very simple question, and perhaps even silly. However, I want to make sure I'm keeping the right number of people. Should I generate individuals with outlying missingness in their genotype rates before filtering for minor allele frequency? I did it both ways, where I provided a dataset filtered for maf of 1% and then an uncleaned dataset (with low mafs as well). In the dataset that had been cleaned for MAF, I get a higher number of people (about 30 extra) that have outlying missingness. However, in the other dataset, not cleaned for MAF, there is less people with outlying genotype rates and thus less people to exclude. It definitely makes sense why there are more people with outlying genotype rates after filtering for MAF. For me, instinctively, it seems that filtering for MAF should be the last step in the process as we would be falsely classifying some people with outlying missingness rate?I would greatly appreciate the advice!

Thank you!

