Question

R: How To Access The Content Of A Cell

0

Entering edit mode

11.8 years ago

darcangelo.elisa ▴ 10

Hello everyone,

I'm quite new to R and was wondering whether someone could help me with the following: I am interested in how to sort/modify/grep the content of a cell in a column of a data frame:

this is an example of the content of 1 cell in this column: P(0.826)KTVQAAP(0.296)P(0.296)AIP(0.645)GP(0.539)P(0.536)GAP(0.949)VNM(0.912)Y

I would like to be able to separate the letters from the numbers, but I don't know if there is any R command and/or special package I'd have to use. I don't know if this is possible in the first place. Any help is greatly appreciated!

Thank you!

Elisa

r • 2.2k views

ADD COMMENT • link 11.8 years ago by darcangelo.elisa ▴ 10

1

Entering edit mode

Maybe you can clarify what you mean by 'separate the letters from the numbers.' I'm not sure how to interpret the contents of the cell. Do those letters represent amino acids? Do you want something like this?

P       0.826
KTVQAAP 0.296
P       0.296
AIP     0.645
GP      0.539
P       0.536
GAP     0.949
VNM     0.912
Y

ADD REPLY • link 11.8 years ago by dfornika ★ 1.1k

score 1 · Answer 1 · 2012-07-12

1

Entering edit mode

11.8 years ago

Zach Powers ▴ 340

there are lots of ways to manipulate strings in R. Check out this stackoverflow post as a good place to start - the answers include use of gsub, and the stringr package, both of which might be useful for you.

good luck! zach cp

ADD COMMENT • link 11.8 years ago by Zach Powers ▴ 340

score 1 · Answer 2 · 2012-07-12

I would use a combination of gregexpr and split to return all values matching a pattern. Below is an example function you could use:

 my.function <- function(query, string) {
  y <- unlist(gregexpr(query, string))
  z <- unlist(strsplit(x,''))
  return(z[y])
 }

 > x <- "P(0.826)KTVQAAP(0.296)P(0.296)AIP(0.645)GP(0.539)P(0.536)GAP(0.949)VNM(0.912)Y"
 > my.function(x, "[[:alpha:]]")
 [1] "P" "K" "T" "V" "Q" "A" "A" "P" "P" "A" "I" "P" "G" "P" "P" "G" "A" "P" "V"
[20] "N" "M" "Y"
 > my.function(x, "[[:digit:]]")
 [1] "0" "8" "2" "6" "0" "2" "9" "6" "0" "2" "9" "6" "0" "6" "4" "5" "0" "5" "3"
[20] "9" "0" "5" "3" "6" "0" "9" "4" "9" "0" "9" "1" "2"

I hope that helps!

EDIT:

I didn't really think about the context of your question. Here is a better solution, given the context:

> x <- "P(0.826)KTVQAAP(0.296)P(0.296)AIP(0.645)GP(0.539)P(0.536)GAP(0.949)VNM(0.912)Y"
> unlist(lapply(strsplit(x, ")"), function(y){ strsplit(y, "[(]") }))
 [1] "P"       "0.826"   "KTVQAAP" "0.296"   "P"       "0.296"   "AIP"
 [8] "0.645"   "GP"      "0.539"   "P"       "0.536"   "GAP"     "0.949"
[15] "VNM"     "0.912"   "Y"

score 0 · Answer 3 · 2012-07-13

0

Entering edit mode

11.8 years ago

darcangelo.elisa ▴ 10

Thank you guys! That was really helpful! :)

elisa

ADD COMMENT • link 11.8 years ago by darcangelo.elisa ▴ 10