I'm having trouble finding which SNPs have been pruned after LD pruning. I want to be able to list those loci that are being removed so I can remove them from my dataset for future analyses. I'll be honest that I am very new to this stuff!
Here is my code (forgive me for any errors!):
library(gdsfmt) library(SNPRelate) ### File Conversion - VCF to GDS ### vcf.fn <- "H:PhD Research/UNEAK_Bat_West1+3+4/Filtered Data/CORA 50%.vcf" snpgdsVCF2GDS(vcf.fn, "CORA.gds", method="biallelic.only") snpgdsSummary("CORA.gds") setwd("H:/PhD Research/UNEAK_Bat_West1+3+4/SNPRelate/") ### Clean-up Linkage Disequilibrium ### CORA.gds <- snpgdsOpen("CORA.gds", readonly=FALSE, allow.duplicate=TRUE, allow.fork=FALSE) set.seed(1000) snpset <- snpgdsLDpruning(CORA.gds, ld.threshold=0.8) names(snpset) head(snpset$chr0) snp.id <- unlist(snpset) read.gdsn(index.gdsn(CORA.gds, "snpset"))
I'm trying to read the GDS field in the last line but can't seem to get an output. Instead I get an error:
Error in inherits(node, "gdsn.class") : No such GDS node "snpset"!
I'm also trying to calculate HWE equilibrium and have come across very little documentation on how to do this, but here is my code:
### Clean-up HWE ### CORA.gds <- snpgdsOpen("CORA.gds", readonly=FALSE, allow.duplicate=FALSE, allow.fork=FALSE) sample.id <- read.gdsn(index.gdsn(CORA.gds, "sample.id")) p <- snpgdsHWE(CORA.gds, sample.id=NULL, snp.id=NULL, with.id=TRUE) summary(p) plot(-log10((1:length(p))/length(p)), -log10(p[order(p)]), xlab="-log10(expected P)", ylab="-log10(observed P)", main="QQ Plot")
Any advice at all on any of these aspects would be much appreciated!