Question: why filtering sites with allele count below 3
gravatar for mab658
3 months ago by
mab65820 wrote:

Hi everyone,

I came across SNP filtering tutorial where the author used the flag --mac 3 to filter SNPs that have a minor allele count less than 3. That is vcftools --vcf input_file.vcf --mac 3 --recode --out filtered_file Could someone explain to me why filtering out sites with minor allele count below 3? By retaining 3 alleles and above, what exactly are we aiming at? I tried to apply the above script to my snps data from cassava crops (diploid with 18 chromosomes) having 359793 sites x 980 samples. After filtering, I now have 147518 sites x 980 samples indicating large number of sites where dropped. What is going on? Please I need better clarification on --mac 3. This is because I also intend to filter for minor allele frequency later on.enter code here Thanks

sequencing snp • 205 views
ADD COMMENTlink modified 3 months ago by Jean-Karim Heriche18k • written 3 months ago by mab65820
gravatar for Jean-Karim Heriche
3 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche18k wrote:

Alleles which are observed infrequently are more likely to be errors and in any case, they would not be useful because they have low statistical power to detect association with anything so they are usually filtered out.

ADD COMMENTlink written 3 months ago by Jean-Karim Heriche18k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1117 users visited in the last hour