Question: Allele count from factor variable in R
0
gravatar for pifferdavide
4.4 years ago by
pifferdavide100
Italy
pifferdavide100 wrote:

I am writing code to count alleles from 23andMe genome text files. The code returns a factor with levels corresponding to allele symbols. I want to assign a number to each genotype. I want to code so that each effect allele is scored as 1 and the other allele as 0. In this case AA=2, AG=1, GG=0. Instead, if I use the as.integer function, it simply assigns the number corrisponding to the position among the levels(see bottom of output), but that is not what I want.

As the alleles column (V4) has 19 different levels (corresponding to all the alleles present in the genome) I am interested in only 4 of them for each SNP. How do I assign a numeric value to each of the four genotypes?

> setwd("~/genomes") >

mydata=read.table("genome_003.txt") > View(mydata) > library(Hmisc) > df=as.data.frame(mydata) > > > > rownumber=match('rs9375195', rs)#returns the first location of SNP

> df[rownumber,] #displays row corrisponding to SNP

V1 V2 V3 V4 224186 rs9375195 6 98562720 AA

> > genotype=df[rownumber,]$V4 >

genotype #displays alleles for corresponding SNP [1]

AA #genotype

Levels: -- A AA AC AG AT C CC CG CT DD DI G GG GT I II T TT > number=as.integer(genotype) > number [1] 3

snp R genome • 2.9k views
ADD COMMENTlink modified 4.4 years ago by PoGibas4.8k • written 4.4 years ago by pifferdavide100

So what you want is: genotype=df[rownumber,]$V4 to return 2 instead of AA?

ADD REPLYlink written 4.4 years ago by PoGibas4.8k

Exactly so!

ADD REPLYlink written 4.4 years ago by pifferdavide100
1
gravatar for Steven Lakin
4.4 years ago by
Steven Lakin1.4k
Fort Collins, CO, USA
Steven Lakin1.4k wrote:

You can use a vector in R the same way you would use a dictionary in another language:

myVector <- c(1,2,3)

names(myVector) <- c("vector", "of", "names")

myVector["vector"]  # returns the name and its value (key and value in a dictionary)
as.numeric(myVector["vector"])  # returns the value associated with the name

names(myVector)[myVector == 1]  # returns the name associated with the value of 1

You can initialize the vector with its key/value pair also:

myVector <- c("vector" = 1, "of" = 2, "names" = 3)

 

So what you want to do is build a dictionary to translate your letters into the numbers.  Build the dictionary, then pass your values through it.

 

 Check out this thread on stackoverflow for more details:

http://stackoverflow.com/a/2865191/4872975

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Steven Lakin1.4k

I am not sure. The process you are describing works for assigning letters to number but I do not know how to do it the other way round (assigning numbers to letters or levels of a factor). In this instance, the values are pre-assigned by R to each level according to the alphabetical order (as by default), so -- is 0, A is 1, AA is 2, AC is 3, etc.

Instead, I want to assign the values to the levels from scratch, ignoring the default R settings.

ADD REPLYlink written 4.4 years ago by pifferdavide100
1
# example data frame
df <- data.frame("Genotype" = c("Genotype1", "Genotype2", "Genotype3"), "Alleles" = c("AA", "AC", "GG"))

   Genotype Alleles
1 Genotype1      AA
2 Genotype2      AC
3 Genotype3      GG

# set the numbers equal to whatever you want for each allele
translate <- c("A" = 1, "AA" = 2, "AC" = 1, "AG" = 1, "AT" = 1, "C" = 0, "CC" = 0, "CG" = 0, "CT" = 0, "DD" = 0, "DI" = 0, "G" = 0, "GG" = 0, "GT" = 0, "I" = 0, "II" = 0, "T" = 0, "TT" = 0)

 

as.numeric(translate[as.character(df[rownumber, columnnumber])]) # get allele value
as.numeric(translate[as.character(df[rownumber, ]$columnname)])  # same thing but with column name

 

# Example for the above data frame:

> df[1, ]$Alleles

[1] AA
Levels: AA AC GG

 

> as.numeric(translate[as.character(df[1, ]$Alleles)])

[1] 2

 

One tricky thing with factors is that if you "as.numeric()" the factor, it will give you its level.  Force it to character in order to avoid this.

 

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Steven Lakin1.4k

Brilliant! It works!

ADD REPLYlink written 4.4 years ago by pifferdavide100
0
gravatar for PoGibas
4.4 years ago by
PoGibas4.8k
Vilnius
PoGibas4.8k wrote:
# THIS IS NOT TESTED

# Load libraries
library(data.table)

# Args
SNP <- "rs9375195"
file <- "genome_003.txt"

# Read Data
mydata <- fread(file)

# Add dummy column
mydata[, V4Numbers := 777]
# Allele to number
mydata[V4=="GG", V4Numbers := 0]
mydata[V4=="AG", V4Numbers := 1]
mydata[V4=="AA", V4Numbers := 2]

# Get wanted SNP
# Don't know which is SNP column (lets say it's V2)
mydata[V2==SNP, V4Numbers]

 

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by PoGibas4.8k

I get various error messages. E.g.: > mydata[V4=="GG", V4Numbers := 0]
Error in eval(expr, envir, enclos) : object 'V4' not found

ADD REPLYlink written 4.4 years ago by pifferdavide100

Check class of mydata (class(mediate)), it should be data.table

 

# Try this
mydata <- as.data.table(mydata)
# Also check is there column named V4
ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by PoGibas4.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2017 users visited in the last hour