How can I separate 3 different pieces of information in a column?
4
0
Entering edit mode
2.2 years ago
logbio ▴ 30

For example, in the column I have, there is a line written Ser25Phe. And I want to split the column written HGVS.Consequence as Ser 25 Phe.enter image description here

Programming regex split R gsub • 1.2k views
ADD COMMENT
2
Entering edit mode
2.2 years ago

Since you tagged it as R I'll add an R answer.

Example data.

df <- data.frame(HGVS.Consequence=c("Met1?", "Phe12Ser", "Ala2Glu"))
> df
  HGVS.Consequence
1            Met1?
2         Phe12Ser
3            Ala2Glu

Tidyverse answer.

library("tidyr")

extract(
  df, HGVS.Consequence, into=c("aa1", "pos", "aa2"),
  regex="(^[A-Z][a-z]+|\\?)([[:digit:]]+)([A-Z][a-z]+|\\?)")

  aa1 pos aa2
1 Met   1   ?
2 Phe  12 Ser
3 Ala   2 Glu
ADD COMMENT
1
Entering edit mode
2.2 years ago
sed 's/^\([^0-9]*\)\([0-9]*\)\([^0-9]*\)$/\1\t\2\t\3/' < in > out
ADD COMMENT
1
Entering edit mode
2.2 years ago
supertech ▴ 180

Here is Perl one-liner solution :

echo "Met1?"|perl -ne 's/(\D+)(\d+)(\D+)/\1 \2 \3/g;  print'

Met 1 ?

ADD COMMENT
0
Entering edit mode
$  echo "Met1?"|perl -pe 's/(\D+)(\d+)(\D+)/\1\t\2\t\3/'

Met 1   ?
ADD REPLY
0
Entering edit mode
2.2 years ago
 $ echo "Ser25Phe" | sed -r 's/^([^[:digit:]]+)([[:digit:]]+)([^[:digit:]]+)$/\1\t\2\t\3/'

Ser 25  Phe

 $ echo "Ser25Phe"| while read line; do echo ${line%%[0-9]*}"\t"${line//[^0-9]}"\t"${line##*[0-9]};done

Ser 25  Phe

with R:

> df <- data.frame(HGVS.Consequence=c("Met1?", "Phe12Ser", "Ala2Glu","?1Met"))
> library(stringr)
> library(magrittr)
> str_extract_all(df$HGVS_Consequence,"\\D+|\\d+", simplify = T) %>%
+     set_colnames(c("Before", "Position", "After")) 

     Before Position After
[1,] "Met"  "1"      "?"  
[2,] "Phe"  "12"     "Ser"
[3,] "Ala"  "2"      "Glu"
[4,] "?"    "1"      "Met"

AsAsp line/record will not be parsed this way.

ADD COMMENT

Login before adding your answer.

Traffic: 2812 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6