Entering edit mode
2.9 years ago
jun0914
▴
10
Hi, I'm doing QC step with genetic data before doing imaging genetics study.
I use plink version 1.09.
I calculate heterozygosity rate to exclude individuals that has 3SD from mean value.
By calculating (N(NM)-O(HOM))/N(NM), I was able to get 'het', which is heterozygosity rate. The result is below table.
I filtered 3SD away from mean value, and this sorted out 113 subjects.
But I realized that each population(White,Black,Hispanic,Asian,Others) have different distribution of heterozygosity rate clustered, and about 80 people of excluded subjects were Asian.
Here are my questions.
- Do I have to seperate population before performing any QC?
- If not, do I have to just remove 80 asian, which is about half of full asian population?
Thank you.
Thanks a lot! I decided to go with less strict cutoff, and found out that I can save 100 subjects(including 80 asian) by applying 4SD cutoff instead. This sounds reasonable.