Hi all, I need to perform the following Quality Control checks: 1) allele frequency (AF>5% & <95%), 2) info >0.8, 3) remove all duplicate position SNPs & 4) multi-allelic SNPs.
I am currently using R to do that but because this is not exactly my field I have got a bit stuck.
For checks n 1 & 2, I have found the "snpReady" package that seems ideal and also gives me a report https://cran.r-project.org/web/packages/snpReady/vignettes/snpReady-vignette.html#QC
the code I used is as following:
install.packages("snpReady") library(snpReady) library(impute) library(Matrix) library(matrixcalc) library(stringr) library(rgl) df <- read.csv("somedf.csv", header = TRUE, na.strings = "NA") head(df) dim(df) geno.ready <- raw.data(as.matrix(df), frame = "long", base = TRUE, sweep.sample = 0.8, call.rate = 0.95, maf = 0.05, imput = FALSE)
However, I can't seem to make it work as it gives me this error:
"Error in raw.data(data = as.matrix(geno), frame = "long", base = TRUE, : could not find function "raw.data""
Do you know why this might be? Would you recommend any other packages or software that would make me do my checks more easily?
thank you! Silvia
EDIT: I manage to fix the error, it worked well on a different computer after updating all the packages and installing individually the ones I needed. However, I can no longer use this script as it allows me to have a data frame with 4 columns only and mine is larger than that. I can't seem to find a suitable package. Do you have any to recommend?
My data has the following headers ID, SNP (e.g., 1:15791:C:T), Allele 1 (i.e, minor allele - e.g., C), Allele 2 (e.g., T), beta, pvalue, Chromosome (e.g., 1), Base-pair position (e.g., 15791).