More rare variants than common variants
1
1
Entering edit mode
4 months ago
RT ▴ 20

Hi there!

I have a joint-called VCF with10000+ samples (all were 30x cov WGS with multiple ancestry), and intend to carry out GWAS analysis. I used Plink to convert VCF to BED, and applied the initial QC (--mind and --geno) , after which I had a good genotyping rate of ~95% with 180M variants, but when I tried to apply the minor allele freq --maf 0.01 , more than 95% variants were removed with only 900K variants retained. My question is,

  1. Is it normal to have more rare variants (MAF<0.01) than common variants in a large dataset like this?
  2. As we don't know the self-reported ancestry for all, I am going to use somalier to do ancestry prediction, Should I use this to separately apply the maf fitler for each ancestry? I appreciate your inputs and suggestions.

Thank you!

PLINK GWAS • 537 views
ADD COMMENT
1
Entering edit mode
4 months ago
LChart 5.1k

Is it normal to have more rare variants (MAF<0.01) than common variants in a large dataset like this?

Yes, this is the site frequency spectrum in action.

As we don't know the self-reported ancestry for all, I am going to use somalier to do ancestry prediction, Should I use this to separately apply the maf fitler for each ancestry? I appreciate your inputs and suggestions.

This is a circular question. If you don't have ancestry labels, you can't apply a filter within each ancestry group. The only thing you can do is apply it globally and then run Somalier. Edit: You may mean you have partial ancestry information. In general ancestry inference works fine with a hard threshold common variants, as there are plenty of frequency-divergent SNPs.

ADD COMMENT

Login before adding your answer.

Traffic: 2763 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6