Question

Counting the frequency of genotypes per row based on the calls of the first column in a data frame in R

1

Entering edit mode

5.5 years ago

Famf ▴ 30

I have a genotype data frame in R similar to this

ID  P1  P2  in1 in2 in3 in4
M01 CC  GG  CC  GG  CC  GG
M02 TT  CC  TT  TT  CC  TT
M03 AA  GG  AA  GG  GG  GG
M04 CC  GG  CC  GG  CC  GG
M05 GG  AA  AA  GG  AA  AA
M06 CC  GG  CC  GG  CC  CC

I want to add a column with the frequencies of all the genotypes in the column P1. I want to count starting from the column in1 onward per each row. Like the table below:

ID  P1  P2  in1 in2 in3 in4 frqP1
M01 CC  GG  CC  GG  CC  GG  2
M02 TT  CC  TT  TT  CC  TT  3
M03 AA  GG  AA  GG  GG  GG  1
M04 CC  GG  CC  GG  CC  GG  2
M05 GG  AA  AA  GG  AA  AA  1
M06 CC  GG  CC  GG  CC  CC  3

I was trying with following code but it doesn't work

df$frqP1 <- rowSums(df[-1] == df$P1)

Any idea?

R genotype • 1.7k views

ADD COMMENT • link updated 5.5 years ago by ATpoint 82k • written 5.5 years ago by Famf ▴ 30

0

Entering edit mode

it doesn't work

Does it throw an error (then add the error/warning message), does it give wrong output?

ADD REPLY • link 5.5 years ago by zx8754 11k

score 2 · Answer 1 · 2018-11-06

2

Entering edit mode

5.5 years ago

ATpoint 82k

df$frqP1 <- rowSums(df[-c(1:3)] == as.character(df$P1))

You were almost right. Just convert the query (df$P1) from factor level to character, and make sure that you really only keep the in-columns in the subject, so remove columns 1 to 3.

ADD COMMENT • link 5.5 years ago by ATpoint 82k

0

Entering edit mode

Effectively, that works!. But I realized it returns a NA instead of a value in the column frqP1 for those rows that have at least one missing data (NA). Is there any way to avoid that?

ADD REPLY • link 5.5 years ago by Famf ▴ 30

0

Entering edit mode

Use na.rm=TRUE to ignore NAs. Read the manuals.

ADD REPLY • link 5.5 years ago by zx8754 11k