Question: Genetic QC checks in R - packages and errors
gravatar for Silvia
7 months ago by
Silvia10 wrote:

Hi all, I need to perform the following Quality Control checks: 1) allele frequency (AF>5% & <95%), 2) info >0.8, 3) remove all duplicate position SNPs & 4) multi-allelic SNPs.

I am currently using R to do that but because this is not exactly my field I have got a bit stuck.

For checks n 1 & 2, I have found the "snpReady" package that seems ideal and also gives me a report

the code I used is as following:


df <- read.csv("somedf.csv", header = TRUE,  na.strings = "NA")
geno.ready <-, frame = "long", base = TRUE, sweep.sample = 0.8, call.rate = 0.95, maf = 0.05, imput = FALSE)

However, I can't seem to make it work as it gives me this error:

"Error in = as.matrix(geno), frame = "long", base = TRUE, : could not find function """

Do you know why this might be? Would you recommend any other packages or software that would make me do my checks more easily?

thank you! Silvia

EDIT: I manage to fix the error, it worked well on a different computer after updating all the packages and installing individually the ones I needed. However, I can no longer use this script as it allows me to have a data frame with 4 columns only and mine is larger than that. I can't seem to find a suitable package. Do you have any to recommend?

My data has the following headers ID, SNP (e.g., 1:15791:C:T), Allele 1 (i.e, minor allele - e.g., C), Allele 2 (e.g., T), beta, pvalue, Chromosome (e.g., 1), Base-pair position (e.g., 15791).

snp qc R software error • 388 views
ADD COMMENTlink modified 7 months ago • written 7 months ago by Silvia10

Can you confirm that you've successfully install "snpReady"? snpReady seems to be dependent on "impute" which is on Bioconductor instead of CRAN and need to be installed manually with

if (!requireNamespace("BiocManager", quietly = TRUE))

ADD REPLYlink modified 7 months ago • written 7 months ago by Sam3.2k

Hi Sam, thank you for your reply! Yes, I have. I noticed the "impute" issue but I installed it successfully as I came across the same code you've posted. The rgl package however gave me an error: "package or namespace load failed for ‘rgl’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]): there is no package called ‘digest’". In addition, I also have these warning messages: "package built under R version 3.5.3". I am trying to sort these out but in general I don't think they can explain my "could not find function """-error which is the most concerning one.

ADD REPLYlink written 7 months ago by Silvia10

If that's the case, you will also need to manually install the digest package. (can simply use install.packages("digest"))

ADD REPLYlink written 7 months ago by Sam3.2k

thank you, I actually managed to make it work but that script allows me to have a data frame with 4 columns whereas I got more, so I cannot use it. What scripts or software do you normally use to carry out similar checks?

ADD REPLYlink written 7 months ago by Silvia10

I'll usually just read in the data.frame and do the filtering manually myself.

e.g. if my data.frame (df) has an INFO column and a MAF column and I want to filter by INFO > 0.8 and MAF > 0.05, I can do

res <- subset(df, INFO > 0.8 & MAF > 0.05)
ADD REPLYlink written 7 months ago by Sam3.2k

Thank you, I think I am overcomplicating this then! I will try to do it manually too.

ADD REPLYlink written 7 months ago by Silvia10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 953 users visited in the last hour