Question: Genetic QC checks in R - packages and errors
gravatar for Silvia
3 days ago by
Silvia0 wrote:

Hi all, I need to perform the following Quality Control checks: 1) allele frequency (AF>5% & <95%), 2) info >0.8, 3) remove all duplicate position SNPs & 4) multi-allelic SNPs.

I am currently using R to do that but because this is not exactly my field I have got a bit stuck.

For checks n 1 & 2, I have found the "snpReady" package that seems ideal and also gives me a report

the code I used is as following:


df <- read.csv("somedf.csv", header = TRUE,  na.strings = "NA")
geno.ready <-, frame = "long", base = TRUE, sweep.sample = 0.8, call.rate = 0.95, maf = 0.05, imput = FALSE)

However, I can't seem to make it work as it gives me this error:

"Error in = as.matrix(geno), frame = "long", base = TRUE, : could not find function """

Do you know why this might be? Would you recommend any other packages or software that would make me do my checks more easily?

thank you! Silvia

EDIT: I manage to fix the error, it worked well on a different computer after updating all the packages and installing individually the ones I needed. However, I can no longer use this script as it allows me to have a data frame with 4 columns only and mine is larger than that. I can't seem to find a suitable package. Do you have any to recommend?

My data has the following headers ID, SNP (e.g., 1:15791:C:T), Allele 1 (i.e, minor allele - e.g., C), Allele 2 (e.g., T), beta, pvalue, Chromosome (e.g., 1), Base-pair position (e.g., 15791).

snp qc R software error • 68 views
ADD COMMENTlink modified 2 days ago • written 3 days ago by Silvia0

Can you confirm that you've successfully install "snpReady"? snpReady seems to be dependent on "impute" which is on Bioconductor instead of CRAN and need to be installed manually with

if (!requireNamespace("BiocManager", quietly = TRUE))

ADD REPLYlink modified 3 days ago • written 3 days ago by Sam2.6k

Hi Sam, thank you for your reply! Yes, I have. I noticed the "impute" issue but I installed it successfully as I came across the same code you've posted. The rgl package however gave me an error: "package or namespace load failed for ‘rgl’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]): there is no package called ‘digest’". In addition, I also have these warning messages: "package built under R version 3.5.3". I am trying to sort these out but in general I don't think they can explain my "could not find function """-error which is the most concerning one.

ADD REPLYlink written 3 days ago by Silvia0

If that's the case, you will also need to manually install the digest package. (can simply use install.packages("digest"))

ADD REPLYlink written 3 days ago by Sam2.6k

thank you, I actually managed to make it work but that script allows me to have a data frame with 4 columns whereas I got more, so I cannot use it. What scripts or software do you normally use to carry out similar checks?

ADD REPLYlink written 2 days ago by Silvia0

I'll usually just read in the data.frame and do the filtering manually myself.

e.g. if my data.frame (df) has an INFO column and a MAF column and I want to filter by INFO > 0.8 and MAF > 0.05, I can do

res <- subset(df, INFO > 0.8 & MAF > 0.05)
ADD REPLYlink written 1 day ago by Sam2.6k

Thank you, I think I am overcomplicating this then! I will try to do it manually too.

ADD REPLYlink written 16 hours ago by Silvia0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1283 users visited in the last hour