Allele count from factor variable in R
2
0
Entering edit mode
5.9 years ago
pifferdavide ▴ 100

I am writing code to count alleles from 23andMe genome text files. The code returns a factor with levels corresponding to allele symbols. I want to assign a number to each genotype. I want to code so that each effect allele is scored as 1 and the other allele as 0. In this case AA=2, AG=1, GG=0. Instead, if I use the as.integer function, it simply assigns the number corrisponding to the position among the levels(see bottom of output), but that is not what I want.

As the alleles column (V4) has 19 different levels (corresponding to all the alleles present in the genome) I am interested in only 4 of them for each SNP. How do I assign a numeric value to each of the four genotypes?

> setwd("~/genomes") >

mydata=read.table("genome_003.txt") > View(mydata) > library(Hmisc) > df=as.data.frame(mydata) > > > > rownumber=match('rs9375195', rs)#returns the first location of SNP

> df[rownumber,] #displays row corrisponding to SNP

V1 V2 V3 V4 224186 rs9375195 6 98562720 AA

> > genotype=df[rownumber,]$V4 > genotype #displays alleles for corresponding SNP [1] AA #genotype Levels: -- A AA AC AG AT C CC CG CT DD DI G GG GT I II T TT > number=as.integer(genotype) > number [1] 3 SNP R genome • 3.6k views ADD COMMENT 0 Entering edit mode So what you want is: genotype=df[rownumber,]$V4 to return 2 instead of AA?

0
Entering edit mode

Exactly so!

1
Entering edit mode
5.9 years ago
Steven Lakin ★ 1.5k

You can use a vector in R the same way you would use a dictionary in another language:

myVector <- c(1,2,3)

names(myVector) <- c("vector", "of", "names")

myVector["vector"]  # returns the name and its value (key and value in a dictionary)
as.numeric(myVector["vector"])  # returns the value associated with the name

names(myVector)[myVector == 1]  # returns the name associated with the value of 1

You can initialize the vector with its key/value pair also:

myVector <- c("vector" = 1, "of" = 2, "names" = 3)

So what you want to do is build a dictionary to translate your letters into the numbers.  Build the dictionary, then pass your values through it.

Check out this thread on stackoverflow for more details:

http://stackoverflow.com/a/2865191/4872975

0
Entering edit mode

I am not sure. The process you are describing works for assigning letters to number but I do not know how to do it the other way round (assigning numbers to letters or levels of a factor). In this instance, the values are pre-assigned by R to each level according to the alphabetical order (as by default), so -- is 0, A is 1, AA is 2, AC is 3, etc.

Instead, I want to assign the values to the levels from scratch, ignoring the default R settings.

1
Entering edit mode
# example data frame
df <- data.frame("Genotype" = c("Genotype1", "Genotype2", "Genotype3"), "Alleles" = c("AA", "AC", "GG"))

Genotype Alleles
1 Genotype1      AA
2 Genotype2      AC
3 Genotype3      GG

# set the numbers equal to whatever you want for each allele
translate <- c("A" = 1, "AA" = 2, "AC" = 1, "AG" = 1, "AT" = 1, "C" = 0, "CC" = 0, "CG" = 0, "CT" = 0, "DD" = 0, "DI" = 0, "G" = 0, "GG" = 0, "GT" = 0, "I" = 0, "II" = 0, "T" = 0, "TT" = 0)

as.numeric(translate[as.character(df[rownumber, columnnumber])]) # get allele value
as.numeric(translate[as.character(df[rownumber, ]$columnname)]) # same thing but with column name # Example for the above data frame: > df[1, ]$Alleles

[1] AA
Levels: AA AC GG

> as.numeric(translate[as.character(df[1, ]\$Alleles)])

[1] 2

One tricky thing with factors is that if you "as.numeric()" the factor, it will give you its level.  Force it to character in order to avoid this.

0
Entering edit mode

Brilliant! It works!

0
Entering edit mode
5.9 years ago
PoGibas 4.9k
# THIS IS NOT TESTED

library(data.table)

# Args
SNP <- "rs9375195"
file <- "genome_003.txt"

mydata[, V4Numbers := 777]
# Allele to number
mydata[V4=="GG", V4Numbers := 0]
mydata[V4=="AG", V4Numbers := 1]
mydata[V4=="AA", V4Numbers := 2]

# Get wanted SNP
# Don't know which is SNP column (lets say it's V2)
mydata[V2==SNP, V4Numbers]

0
Entering edit mode

I get various error messages. E.g.: > mydata[V4=="GG", V4Numbers := 0]

0
Entering edit mode

Check class of mydata (class(mediate)), it should be data.table

# Try this
mydata <- as.data.table(mydata)
# Also check is there column named V4