Problem in LD-based SNP pruning using SNPRelate
0
0
Entering edit mode
3.8 years ago
evelyn ▴ 230

Hello,

I have an expression data matrix and phenotype data. I started with SNPRelate to get the PCA and IBS analysis using:

library(SNPRelate)
library(gdsfmt)

#Convert vcf file to gds file
vcf.fn <- "data.vcf"
snpgdsVCF2GDS(vcf.fn, "test.gds", method="biallelic.only")
snpgdsSummary("test.gds")

#Data analysis
# Open the GDS file
genofile <- snpgdsOpen("test.gds")
pop_code <- read.gdsn(index.gdsn(genofile, "genotype"))

#LD based SNP pruning
set.seed(1000)
snpset <- snpgdsLDpruning(genofile, autosome.only=FALSE, ld.threshold=0.2)
names(snpset)
head(snpset$ch01)
snpset.id <- unlist(snpset)

#Principal Component Analysis (PCA)
pca <- snpgdsPCA(genofile, autosome.only=FALSE, snp.id=snpset.id, num.thread=2)
pc.percent <- pca$varprop*100
head(round(pc.percent, 2))
tab <- data.framesample.id = pca$sample.id,
    EV1 = pca$eigenvect[,1],    # the first eigenvector
    EV2 = pca$eigenvect[,2],    # the second eigenvector
    stringsAsFactors = FALSE)
head(tab)
plot(tab$EV2, tab$EV1, xlab="eigenvector 2", ylab="eigenvector 1")

#perform the IBS analysis
ibs <- snpgdsIBS(genofile, autosome.only=FALSE, num.thread=2) 
length(ibs$ibs) 
summary(as.vector(ibs$ibs))

set.seed(100)
ibs.hc <- snpgdsHCluster(snpgdsIBS(genofile, autosome.only=FALSE, num.thread=2))
rv <- snpgdsCutTree(ibs.hc)
plot(rv$dendrogram, leaflab="none", main="HapMap Phase II")

snpgdsClose(genofile)

LD-pruning showed some markers were dropped. But PCA and IBS analysis still had the original number of markers. I am not sure what is wrong in the code that the markers based on LD-pruning are not being dropped in further analysis.

Thank you in advance.

snp SNPRelate R • 934 views
ADD COMMENT

Login before adding your answer.

Traffic: 2544 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6