Hi there!
I have a joint-called VCF with10000+ samples (all were 30x cov WGS with multiple ancestry), and intend to carry out GWAS analysis. I used Plink to convert VCF to BED, and applied the initial QC (--mind and --geno) , after which I had a good genotyping rate of ~95% with 180M variants, but when I tried to apply the minor allele freq --maf 0.01
, more than 95% variants were removed with only 900K variants retained. My question is,
- Is it normal to have more rare variants (MAF<0.01) than common variants in a large dataset like this?
- As we don't know the self-reported ancestry for all, I am going to use somalier to do ancestry prediction, Should I use this to separately apply the maf fitler for each ancestry? I appreciate your inputs and suggestions.
Thank you!