Question: How to remove the high correlated SNPs in R
United States
United States
zwang10

Hello all!

I have a data set (matrix) of a gene, each row represents an individual, and each column represents a genotype score (0, 1, 2). How can I remove the high correlated (r=0.8) SNPs?

I was trying using SNPRelate. But it needs GDS file. But there is no column name of the matrix.

16 months ago by
mbyvcm

Not an R solution, but you could try the pruning you data based on LD in PLINK; PLINK LD Prune. You would need to convert you matrix into PLINK files.  

Can you tell me use which tool can convert matrix into PLINK files?

You have a few options. With a little manipulation in R you could easily generate a map and ped (file specs are here). Another option would be to use the R package snpStats to; i) convert your matric to a snpStats object, ii) write out PLINK files in R.

