Allele count from factor variable in R
2
0
Entering edit mode
9.0 years ago
pifferdavide ▴ 110

I am writing code to count alleles from 23andMe genome text files. The code returns a factor with levels corresponding to allele symbols. I want to assign a number to each genotype. I want to code so that each effect allele is scored as 1 and the other allele as 0. In this case AA=2, AG=1, GG=0. Instead, if I use the as.integer function, it simply assigns the number corresponding to the position among the levels(see bottom of output), but that is not what I want.

As the alleles column (V4) has 19 different levels (corresponding to all the alleles present in the genome) I am interested in only 4 of them for each SNP. How do I assign a numeric value to each of the four genotypes?

setwd("~/genomes")

mydata=read.table("genome_003.txt")
View(mydata)

library(Hmisc)
df=as.data.frame(mydata)
rownumber=match('rs9375195', rs)#returns the first location of SNP
df[rownumber,] #displays row corresponding to SNP

V1 V2 V3 V4 224186 rs9375195 6 98562720 AA

genotype=df[rownumber,]$V4
genotype #displays alleles for corresponding SNP [1]

AA #genotype
Levels: -- A AA AC AG AT C CC CG CT DD DI G GG GT I II T TT > number=as.integer(genotype) > number [1] 3
genome SNP R • 4.9k views
ADD COMMENT
0
Entering edit mode

So what you want is for genotype=df[rownumber,]$V4 to return 2 instead of AA?

ADD REPLY
0
Entering edit mode

Exactly so!

ADD REPLY
1
Entering edit mode
9.0 years ago
Steven Lakin ★ 1.8k

You can use a vector in R the same way you would use a dictionary in another language:

myVector <- c(1,2,3)
names(myVector) <- c("vector", "of", "names")
myVector["vector"]  # returns the name and its value (key and value in a dictionary)

as.numeric(myVector["vector"])  # returns the value associated with the name
names(myVector)[myVector == 1]  # returns the name associated with the value of 1

You can initialize the vector with its key/value pair also:

myVector <- c("vector" = 1, "of" = 2, "names" = 3)

So what you want to do is build a dictionary to translate your letters into the numbers. Build the dictionary, then pass your values through it.

Check out this thread on stackoverflow for more details: http://stackoverflow.com/a/2865191/4872975

ADD COMMENT
0
Entering edit mode

I am not sure. The process you are describing works for assigning letters to number but I do not know how to do it the other way round (assigning numbers to letters or levels of a factor). In this instance, the values are pre-assigned by R to each level according to the alphabetical order (as by default), so -- is 0, A is 1, AA is 2, AC is 3, etc.

Instead, I want to assign the values to the levels from scratch, ignoring the default R settings.

ADD REPLY
1
Entering edit mode
# example data frame
df <- data.frame("Genotype" = c("Genotype1", "Genotype2", "Genotype3"), "Alleles" = c("AA", "AC", "GG"))

   Genotype Alleles
1 Genotype1      AA
2 Genotype2      AC
3 Genotype3      GG

# set the numbers equal to whatever you want for each allele
translate <- c("A" = 1, "AA" = 2, "AC" = 1, "AG" = 1, "AT" = 1, "C" = 0, "CC" = 0, "CG" = 0, "CT" = 0, "DD" = 0, "DI" = 0, "G" = 0, "GG" = 0, "GT" = 0, "I" = 0, "II" = 0, "T" = 0, "TT" = 0)

as.numeric(translate[as.character(df[rownumber, columnnumber])]) # get allele value
as.numeric(translate[as.character(df[rownumber, ]$columnname)])  # same thing but with column name

# Example for the above data frame:
df[1, ]$Alleles

[1] AA
Levels: AA AC GG

as.numeric(translate[as.character(df[1, ]$Alleles)])

[1] 2

One tricky thing with factors is that if you "as.numeric()" the factor, it will give you its level. Force it to character in order to avoid this.

ADD REPLY
0
Entering edit mode

Brilliant! It works!

ADD REPLY
0
Entering edit mode
9.0 years ago
PoGibas 5.1k
# THIS IS NOT TESTED

# Load libraries
library(data.table)

# Args
SNP <- "rs9375195"
file <- "genome_003.txt"

# Read Data
mydata <- fread(file)

# Add dummy column
mydata[, V4Numbers := 777]
# Allele to number
mydata[V4=="GG", V4Numbers := 0]
mydata[V4=="AG", V4Numbers := 1]
mydata[V4=="AA", V4Numbers := 2]

# Get wanted SNP
# Don't know which is SNP column (lets say it's V2)
mydata[V2==SNP, V4Numbers]
ADD COMMENT
0
Entering edit mode

I get various error messages. E.g.:

mydata[V4=="GG", V4Numbers := 0]
Error in eval(expr, envir, enclos) : object 'V4' not found
ADD REPLY
0
Entering edit mode

Check class of mydata (class(mydata)), it should be data.table

# Try this
mydata <- as.data.table(mydata)
# Also check is there column named V4
ADD REPLY

Login before adding your answer.

Traffic: 2177 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6