using snprelate on 23andme dataset
7.1 years ago
abims

I will use 23andme raw data in snprelate, my data is like that, and there 500 individuals:

# rsid    chromosome    position    genotype
rs4477212    1    82154    AA
rs3094315    1    752566    AA
rs3131972    1    752721    GG

But i need to first convert it to gds format.

I should denote snp.id, sample.id, snp.position etc. and also create genotype

add.gdsn(newfile, "sample.id", sample.id)
add.gdsn(newfile, "snp.allele", c("A/G", "T/C", ...))

.....

var.geno <- add.gdsn(newfile, "genotype",
valdim=clengthsnp.id), lengthsample.id)), storage="bit2")

What I understand is sample.id is the vector of all the user ids, snp.id is the vector of all snps and so on. So, in genotype part how would i indicate that user x's snp id y is AA ? What kind of a matrix is it?

My second question is how should I compute reference alleles, should I compute it on my 500 people population or should I check them from somewhere else, if its where do you suggest?

Thank you so much.

7.1 years ago
Neilfws

First part of question: to convert to GDS, I would try first converting the 23andme data to VCF. A few tools claim to do this; the best I've found is here.

Then you can try snpgdsVCF2GDS() in the SNPRelate package to convert VCF to GDS.