which R package to use to do Quality control for SNPs data
1
0
Entering edit mode
7.8 years ago
mms140130 ▴ 60

Hello,

I have a large SNP data, I'm trying to remove the SNPs with minor allele frequency (MAF) < 5% and the ones that don't follow Hardy-weinberg equilibrium . I'm using R and I don't know which package does that any help please

SNP R snp • 3.7k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
2
Entering edit mode
7.8 years ago
willgilks ▴ 360

Hi MMS,

You could try https://cran.r-project.org/web/packages/vcfR/index.html although without meaning to sound intentionally vain, I think my graphs are better :) https://f1000research.com/articles/5-2644/v3 with code available at https://zenodo.org/record/159272#.WKCKsBAnp7E. To visualise the qc using R you can use GATK variantsToTable function to make a readable table. https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_variantutils_VariantsToTable.php

GATK also has a Hard-Weinberg calculator but I'm not sure about filtering variants directly https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_annotator_HardyWeinberg.php

If you don't use GATK, then programs like vcfTools and bcfTools could probably help, otherwise you have you write your own Perl/Bash/Python/whatever scripts. Plink 1.9 is good too https://www.cog-genomics.org/plink2. You have to convert your vcf into just plink format genotypes, then it's easy to filter by MAF, and HWE.

ADD COMMENT
0
Entering edit mode

Thank you , I appreciate your help

ADD REPLY

Login before adding your answer.

Traffic: 2298 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6