Question

SNPRelate LD Pruning List of Loci

1

Entering edit mode

5.3 years ago

jessivannatta ▴ 10

Hi there,

I'm having trouble finding which SNPs have been pruned after LD pruning. I want to be able to list those loci that are being removed so I can remove them from my dataset for future analyses. I'll be honest that I am very new to this stuff!

Here is my code (forgive me for any errors!):

library(gdsfmt)
library(SNPRelate)

### File Conversion - VCF to GDS ###
vcf.fn <- "H:PhD Research/UNEAK_Bat_West1+3+4/Filtered Data/CORA 50%.vcf" snpgdsVCF2GDS(vcf.fn, "CORA.gds", method="biallelic.only")
snpgdsSummary("CORA.gds") 

setwd("H:/PhD Research/UNEAK_Bat_West1+3+4/SNPRelate/")

### Clean-up Linkage Disequilibrium ###
CORA.gds <- snpgdsOpen("CORA.gds", readonly=FALSE, allow.duplicate=TRUE, allow.fork=FALSE) 
set.seed(1000) 
snpset <- snpgdsLDpruning(CORA.gds, ld.threshold=0.8) 
names(snpset) head(snpset$chr0) 
snp.id <- unlist(snpset) 
read.gdsn(index.gdsn(CORA.gds, "snpset"))

I'm trying to read the GDS field in the last line but can't seem to get an output. Instead I get an error:

Error in inherits(node, "gdsn.class") : No such GDS node "snpset"!

I'm also trying to calculate HWE equilibrium and have come across very little documentation on how to do this, but here is my code:

### Clean-up HWE ###
CORA.gds <- snpgdsOpen("CORA.gds", readonly=FALSE, allow.duplicate=FALSE, allow.fork=FALSE)
sample.id <- read.gdsn(index.gdsn(CORA.gds, "sample.id"))
p <- snpgdsHWE(CORA.gds, sample.id=NULL, snp.id=NULL, with.id=TRUE) 
summary(p) 
plot(-log10((1:length(p))/length(p)), -log10(p[order(p)]), xlab="-log10(expected P)", ylab="-log10(observed P)", main="QQ Plot")

Any advice at all on any of these aspects would be much appreciated!

snp SNPRelate • 2.4k views

ADD COMMENT • link updated 5.3 years ago by 15993216250 • 0 • written 5.3 years ago by jessivannatta ▴ 10

0

Entering edit mode

Hey, thank you for re-posting here. Just for the LD part, I think that you may in fact want to select ld.threshold=0.2 - 0.8 seems too high? This may not change anything, though.

How did you build this tutorial so far? - following this: http://corearray.sourceforge.net/tutorials/SNPRelate/ ?

The packages loaded correctly?

Have you performed any pre-processing on your VCF?; Did both snpgdsVCF2GDS and snpgdsSummary run okay?

ADD REPLY • link 5.3 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin,

Thanks for your response. Yes, I was following the tutorial you mention to build my code, but it doesn't mention how to visualize the markers once they have been pruned.

All of the packages loaded correctly and all of the functions worked. It tells me I have 2,745 markers selected from my original 3,569, but I'm just not sure how to visualize them.How do I know which markers have been pruned so I can remove them from my dataset and use the cleaned set for future analyses? Is there a way to write the .gds to a .csv or something similar? Also, I did not perform any pre-processing on my VCF. I took it just as it is and converted it to .gds. Let me know your thoughts.

ADD REPLY • link 5.3 years ago by jessivannatta ▴ 10

0

Entering edit mode

Why not just get the difference between the list of 3569 markers and that of 2745? I have not used SNPRelate. I use PLINK (recommended by 15993216250) for this type of analysis.

ADD REPLY • link 5.3 years ago by Kevin Blighe 87k

0

Entering edit mode

Yes, I am trying to get the difference between the two lists but I'm not sure how to visualize that. How do I see which markers were removed from the list of 3569 to make it 2745? How can I get an output? If I can't figure out SNPRelate, I am going to give PLINK a try. Thank you for your input!

ADD REPLY • link 5.3 years ago by jessivannatta ▴ 10

score 0 · Answer 1 · 2018-12-31

0

Entering edit mode

5.3 years ago

15993216250 • 0

How about trying a function of plink called --indep-pairwise to do SNP pruning?

ADD COMMENT • link 5.3 years ago by 15993216250 • 0

0

Entering edit mode

Thank you for your suggestion. I'm not very familiar with plink, but I feel as if it might be too complex for someone who has very little experience with using the command line, like myself. However, I'm very open to giving it a try. Do you have a simple code you would recommend for SNP pruning in plink?

ADD REPLY • link 5.3 years ago by jessivannatta ▴ 10