Question: R programming: string split n number of characters and cbind them in n number of new columns
2
gravatar for MAPK
3.8 years ago by
MAPK1.4k
United States
MAPK1.4k wrote:

 Hi Guys,

I have this table (mydf) with the coulmn ALT, and I want to split the string in each cell of this column so that there are only one chacter per cell and cbind them to new column as ALT1, ALT2,ALT3 ...ALTn based on the number of characters in the cells and get the result table. How can I do this in R? Thank you for your help. 

mydf

REF ALT GENE
A GC MAPK
T GCA..n MAP2K

 result

REF ALT ALT1 ALT2 ALTn GENE
A GC G C   MAPK
T GCA..n G C nth character MAP2K

 

R • 2.4k views
ADD COMMENTlink modified 3.7 years ago by Veerendra Gadekar0 • written 3.8 years ago by MAPK1.4k

Nice solution! Also have a look at the solution posted by me. Might be useful and efficient for huge data set. 

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Veerendra Gadekar0
4
gravatar for MAPK
3.8 years ago by
MAPK1.4k
United States
MAPK1.4k wrote:

I think I figured it out myself:

library(plyr)
sst <- strsplit(mydf[,"ALT"], "")
additional.cols<-rbind.fill.matrix(lapply(sst, rbind))
colnames(additional.cols)<-c(paste("ALT",sequence(ncol(additional.cols)),sep=""))
ADD COMMENTlink modified 2.8 years ago • written 3.8 years ago by MAPK1.4k
0
gravatar for Veerendra Gadekar
3.7 years ago by
Italy
Veerendra Gadekar0 wrote:

Here is another approach

library(dplyr)
library(splitstackshape)

mydf$ALT = as.character(mydf$ALT)
cSplit(data.frame(mydf %>% rowwise() %>%
       mutate(ALT = ALT, ALTn = paste(unlist(strsplit(ALT, "")),
       collapse = ','))), "ALTn", ",")

# result:
#   REF   ALT  GENE ALTn_1 ALTn_2 ALTn_3 ALTn_4 ALTn_5
#1:   A    GC  MAPK      G      C     NA     NA     NA
#2:   T GCAGA MAP2K      G      C      A      G      A

Here is the reproducible sample data (Just copy and paste the text below in R, to check the above code)

mydf = structure(list(REF = structure(1:2, .Label = c("A", "T"), class = "factor"),
    ALT = c("GC", "GCAGA"), GENE = structure(c(2L, 1L), .Label = c("MAP2K",
    "MAPK"), class = "factor")), .Names = c("REF", "ALT", "GENE"
), row.names = c(NA, -2L), class = "data.frame")
ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by Veerendra Gadekar0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1349 users visited in the last hour