Question: Quality control filter by genotype counts in a GWAS?
0
gravatar for User 7754
3.6 years ago by
User 7754170
United Kingdom
User 7754170 wrote:

Hi,

 

I am working with a study that has genotyped data and applying the following filters to clean the data, keeping only the SNPs that have:

CALL_RATE >0.95

HWE_P >1e-6

MAF>0.001

However I am finding some SNPs with an extremely low P-value, and when I look at these SNPs better I find that the genotype counts for homozygote major, heterozygotes, and homozygote minor are very unequal, for example in a sample of 500 they are like this:

N       N0      N1      N2

500   0  1  499

So I am tempted to also filter these cases out, for example requiring that

N0 | N1 | N2 >2

But this way I remove 1/3 of my data. I can't find this in the literature as a usual step for QC in GWAS, is this not done usually? If it is, what is the minimum number acceptable for genotype counts?

Thank you very much for your help!

Fra

filter gwas • 1.7k views
ADD COMMENTlink modified 3.6 years ago by chrchang5234.2k • written 3.6 years ago by User 7754170

Do you actually have 500 people in your sample?  Because then MAF > 0.001 won't be very useful....

ADD REPLYlink written 3.6 years ago by Mitch Bekritsky1.1k

Yes I do...I realise that is a very low limit but I am setting it the same for all studies for a meta-analysis... 

Is that why I am getting 0 for some of the genotype counts? Is the genotype count usually set as a filter additionally to the MAF and other filters I already have?

Thank you for your help

 

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by User 7754170
2
gravatar for chrchang523
3.6 years ago by
chrchang5234.2k
United States
chrchang5234.2k wrote:

This is a standard GWAS step, frequently handled with plink's --geno flag (http://pngu.mgh.harvard.edu/~purcell/plink/thresh.shtml#miss1 ).  [edit: this is incorrect, I misread the original question; refer to the comment about --maf and --hwe instead]

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by chrchang5234.2k

Thank you chrchang523. I thought the filter --geno was used for missing genotypes, but is it actually filtering based on the three genotype counts? So if I followed the filters of --geno 0.1 in Plink, would this be equivalent to manually filtering out the SNPs with less than 50 individuals in any of the genotype groups (with N=500), so N0 | N1 | N2 >50?

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by User 7754170

Oh, sorry, I misread your question.

A combination of --maf (or with the plink 1.9 development build, you can also use --mac) and --hwe should work for your use case.  Very low homozygote counts will be filtered out by --maf, while very low heterozygote counts when the homozygote counts are higher will be filtered out by --hwe.

ADD REPLYlink written 3.6 years ago by chrchang5234.2k

Thank you very much for your clarification! To solve this then would you think I just need to make the thresholds I already have in place stricter?  Instead of for example HWE_P >1e-6 ;  MAF>0.001,     use  HWE_P >1e-4;  MAF>0.01 ?

Using stricter signals indeed helps a lot! However there will still be associations that have only 1 individuals (for example the SNP below with 1 homozygote major), but maybe this could be considered a real signal? 

SNP                  N    N0    N1   N2  MAF HWE_P   CALL_RATE       PVAL

12:112543881  500   1   16   483 0.01613   0.13    1   1e-07

 

A related question is whether to apply these same filters to all the studies independent of sample sizes. This study is part of a pipeline applied to many studies in preparation for a meta-analysis, so we had decided that all the studies should have the same filters for QC applied to them.... In contrast to this approach, would you suggest to use different filters for the studies with a small sample size such as this one? Thank you so much for your help!

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by User 7754170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 781 users visited in the last hour