Question: why filtering sites with allele count below 3
2
gravatar for mab658
15 days ago by
mab65820
mab65820 wrote:

Hi everyone,

I came across SNP filtering tutorial where the author used the flag --mac 3 to filter SNPs that have a minor allele count less than 3. That is vcftools --vcf input_file.vcf --mac 3 --recode --out filtered_file Could someone explain to me why filtering out sites with minor allele count below 3? By retaining 3 alleles and above, what exactly are we aiming at? I tried to apply the above script to my snps data from cassava crops (diploid with 18 chromosomes) having 359793 sites x 980 samples. After filtering, I now have 147518 sites x 980 samples indicating large number of sites where dropped. What is going on? Please I need better clarification on --mac 3. This is because I also intend to filter for minor allele frequency later on.enter code here Thanks

sequencing snp • 112 views
ADD COMMENTlink modified 14 days ago by Jean-Karim Heriche17k • written 15 days ago by mab65820
2
gravatar for Jean-Karim Heriche
14 days ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche17k wrote:

Alleles which are observed infrequently are more likely to be errors and in any case, they would not be useful because they have low statistical power to detect association with anything so they are usually filtered out.

ADD COMMENTlink written 14 days ago by Jean-Karim Heriche17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1665 users visited in the last hour