why filtering sites with allele count below 3
1
2
Entering edit mode
5.3 years ago
mab658 ▴ 120

Hi everyone,

I came across SNP filtering tutorial where the author used the flag --mac 3 to filter SNPs that have a minor allele count less than 3. That is vcftools --vcf input_file.vcf --mac 3 --recode --out filtered_file Could someone explain to me why filtering out sites with minor allele count below 3? By retaining 3 alleles and above, what exactly are we aiming at? I tried to apply the above script to my snps data from cassava crops (diploid with 18 chromosomes) having 359793 sites x 980 samples. After filtering, I now have 147518 sites x 980 samples indicating large number of sites where dropped. What is going on? Please I need better clarification on --mac 3. This is because I also intend to filter for minor allele frequency later on.enter code here Thanks

sequencing SNP • 2.4k views
ADD COMMENT
2
Entering edit mode
5.3 years ago

Alleles which are observed infrequently are more likely to be errors and in any case, they would not be useful because they have low statistical power to detect association with anything so they are usually filtered out.

ADD COMMENT

Login before adding your answer.

Traffic: 3212 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6