remove/replace unwanted character from a column in R
2
0
Entering edit mode
24 months ago
mthm ▴ 50

I have a table with an unwanted repeated character in the column5 "Motif:"

xyz.00290   1565    1575    Target   Motif:TART_DV-Dmon-A 743 795   10
xyz.00291   1576    1617    Target   Motif:Maverick-5_Dmon 256 297  41
xyz.00292   1619    1632    Target   Motif:Jockey-18_Dmon 1702 1771 13

I tried to remove it in R; gsub(df, "Motif:", "") but it ruins the whole table by merging all the columns in one!

I also tried str_remove(df, "Motif:") but it does the same.

how should I make them work? and why does it take soooo long for R to run and finish?! is there a faster way around it considering that my files are big (~12000 lines)

gsub R stringr • 3.6k views
ADD COMMENT
0
Entering edit mode
> df <- data.frame(A="xyz.00290", B="Motif:foo-123")
> df
          A             B
1 xyz.00290 Motif:foo-123
> df$B=stringr::str_split_fixed(df$B,":",2)[2]
> df
          A       B
1 xyz.00290 foo-123

or

> df$B=sub(".*:","",df$B)
> df
          A       B
1 xyz.00290 foo-123
ADD REPLY
2
Entering edit mode
24 months ago
ATpoint 82k

Run it on the particular column, not the entire df:

df <- data.frame(A="xyz.00290", B="Motif:foo-123")
> df

         A             B
 xyz.00290 Motif:foo-123

df$B <- gsub("Motif:", "", df$B)

> df
         A       B
 xyz.00290 foo-123

By the way, 12000 rows is in no way big, if simple things like string substitution take more than a second then your command is flawed in most cases.

ADD COMMENT
1
Entering edit mode
24 months ago
Basti ★ 2.0k

Look at the documentation of these 2 functions before applying it to an entire data.frame or table :

gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
     fixed = FALSE, useBytes = FALSE)

x, text : a character vector where matches are sought, or an object which can be coerced by as.character to a character vector. Long vectors are supported.

str_remove(string, pattern)

string : Input vector. Either a character vector, or something coercible to one.

You are applying two functions that require vector as input to a data.frame/table.

Using lapply, you can use these functions :

df[] <- lapply(df, function(x) gsub("Motif:", "", x, fixed = TRUE)) 
df[] <- lapply(df, function(x) str_remove(x, "Motif:"))
ADD COMMENT
1
Entering edit mode

Your explanation of why the OPs code didn't work is great, but this does not need lapply when they can do df$column instead.

ADD REPLY
0
Entering edit mode

Sure but I hoped to provide a general approach if you need to remove a pattern from an entire dataframe and not only a single column

ADD REPLY

Login before adding your answer.

Traffic: 1801 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6