I am writing code to count alleles from 23andMe genome text files. The code returns a factor with levels corresponding to allele symbols. I want to assign a number to each genotype. I want to code so that each effect allele is scored as 1 and the other allele as 0. In this case AA=2, AG=1, GG=0. Instead, if I use the as.integer function, it simply assigns the number corrisponding to the position among the levels(see bottom of output), but that is not what I want.
As the alleles column (V4) has 19 different levels (corresponding to all the alleles present in the genome) I am interested in only 4 of them for each SNP. How do I assign a numeric value to each of the four genotypes?
> setwd("~/genomes") >
mydata=read.table("genome_003.txt") > View(mydata) > library(Hmisc) > df=as.data.frame(mydata) > > > > rownumber=match('rs9375195', rs)#returns the first location of SNP
> df[rownumber,] #displays row corrisponding to SNP
V1 V2 V3 V4 224186 rs9375195 6 98562720 AA
> > genotype=df[rownumber,]$V4 >
genotype #displays alleles for corresponding SNP [1]
AA #genotype
Levels: -- A AA AC AG AT C CC CG CT DD DI G GG GT I II T TT > number=as.integer(genotype) > number [1] 3
So what you want is:
genotype=df[rownumber,]$V4
to return2
instead ofAA
?Exactly so!