R: How To Access The Content Of A Cell
3
0
Entering edit mode
11.8 years ago

Hello everyone,

I'm quite new to R and was wondering whether someone could help me with the following: I am interested in how to sort/modify/grep the content of a cell in a column of a data frame:

this is an example of the content of 1 cell in this column: P(0.826)KTVQAAP(0.296)P(0.296)AIP(0.645)GP(0.539)P(0.536)GAP(0.949)VNM(0.912)Y

I would like to be able to separate the letters from the numbers, but I don't know if there is any R command and/or special package I'd have to use. I don't know if this is possible in the first place. Any help is greatly appreciated!

Thank you!

Elisa

r • 2.2k views
ADD COMMENT
1
Entering edit mode

Maybe you can clarify what you mean by 'separate the letters from the numbers.' I'm not sure how to interpret the contents of the cell. Do those letters represent amino acids? Do you want something like this?

P       0.826
KTVQAAP 0.296
P       0.296
AIP     0.645
GP      0.539
P       0.536
GAP     0.949
VNM     0.912
Y
ADD REPLY
1
Entering edit mode
11.8 years ago
Zach Powers ▴ 340

there are lots of ways to manipulate strings in R. Check out this stackoverflow post as a good place to start - the answers include use of gsub, and the stringr package, both of which might be useful for you.

good luck! zach cp

ADD COMMENT
1
Entering edit mode
11.8 years ago

I would use a combination of gregexpr and split to return all values matching a pattern. Below is an example function you could use:

 my.function <- function(query, string) {
  y <- unlist(gregexpr(query, string))
  z <- unlist(strsplit(x,''))
  return(z[y])
 }

 > x <- "P(0.826)KTVQAAP(0.296)P(0.296)AIP(0.645)GP(0.539)P(0.536)GAP(0.949)VNM(0.912)Y"
 > my.function(x, "[[:alpha:]]")
 [1] "P" "K" "T" "V" "Q" "A" "A" "P" "P" "A" "I" "P" "G" "P" "P" "G" "A" "P" "V"
[20] "N" "M" "Y"
 > my.function(x, "[[:digit:]]")
 [1] "0" "8" "2" "6" "0" "2" "9" "6" "0" "2" "9" "6" "0" "6" "4" "5" "0" "5" "3"
[20] "9" "0" "5" "3" "6" "0" "9" "4" "9" "0" "9" "1" "2"

I hope that helps!

EDIT:

I didn't really think about the context of your question. Here is a better solution, given the context:

> x <- "P(0.826)KTVQAAP(0.296)P(0.296)AIP(0.645)GP(0.539)P(0.536)GAP(0.949)VNM(0.912)Y"
> unlist(lapply(strsplit(x, ")"), function(y){ strsplit(y, "[(]") }))
 [1] "P"       "0.826"   "KTVQAAP" "0.296"   "P"       "0.296"   "AIP"
 [8] "0.645"   "GP"      "0.539"   "P"       "0.536"   "GAP"     "0.949"
[15] "VNM"     "0.912"   "Y"
ADD COMMENT
0
Entering edit mode
11.8 years ago

Thank you guys! That was really helpful! :)

elisa

ADD COMMENT

Login before adding your answer.

Traffic: 1556 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6