I came across SNP filtering tutorial where the author used the flag
--mac 3 to filter SNPs that have a minor allele count less than 3. That is
vcftools --vcf input_file.vcf --mac 3 --recode --out filtered_file
Could someone explain to me why filtering out sites with minor allele count below 3? By retaining 3 alleles and above, what exactly are we aiming at?
I tried to apply the above script to my snps data from cassava crops (diploid with 18 chromosomes) having 359793 sites x 980 samples. After filtering, I now have 147518 sites x 980 samples indicating large number of sites where dropped.
What is going on? Please I need better clarification on --mac 3. This is because I also intend to filter for minor allele frequency later on.enter code here