Entering edit mode
3.8 years ago
evelyn
▴
230
Hello,
I have an expression data matrix and phenotype data. I started with SNPRelate to get the PCA and IBS analysis using:
library(SNPRelate)
library(gdsfmt)
#Convert vcf file to gds file
vcf.fn <- "data.vcf"
snpgdsVCF2GDS(vcf.fn, "test.gds", method="biallelic.only")
snpgdsSummary("test.gds")
#Data analysis
# Open the GDS file
genofile <- snpgdsOpen("test.gds")
pop_code <- read.gdsn(index.gdsn(genofile, "genotype"))
#LD based SNP pruning
set.seed(1000)
snpset <- snpgdsLDpruning(genofile, autosome.only=FALSE, ld.threshold=0.2)
names(snpset)
head(snpset$ch01)
snpset.id <- unlist(snpset)
#Principal Component Analysis (PCA)
pca <- snpgdsPCA(genofile, autosome.only=FALSE, snp.id=snpset.id, num.thread=2)
pc.percent <- pca$varprop*100
head(round(pc.percent, 2))
tab <- data.framesample.id = pca$sample.id,
EV1 = pca$eigenvect[,1], # the first eigenvector
EV2 = pca$eigenvect[,2], # the second eigenvector
stringsAsFactors = FALSE)
head(tab)
plot(tab$EV2, tab$EV1, xlab="eigenvector 2", ylab="eigenvector 1")
#perform the IBS analysis
ibs <- snpgdsIBS(genofile, autosome.only=FALSE, num.thread=2)
length(ibs$ibs)
summary(as.vector(ibs$ibs))
set.seed(100)
ibs.hc <- snpgdsHCluster(snpgdsIBS(genofile, autosome.only=FALSE, num.thread=2))
rv <- snpgdsCutTree(ibs.hc)
plot(rv$dendrogram, leaflab="none", main="HapMap Phase II")
snpgdsClose(genofile)
LD-pruning showed some markers were dropped. But PCA and IBS analysis still had the original number of markers. I am not sure what is wrong in the code that the markers based on LD-pruning are not being dropped in further analysis.
Thank you in advance.