Question: How to fill with dots empty fields in a dataframe in R when splitting a column by a separator
0
gravatar for giusdalt95
3 months ago by
giusdalt950
giusdalt950 wrote:

Hi everybody! I'm new to Bioinformatics and recently I came across a problem. I have a column ("name") with comma delimited values:

                                          name
name=1,name2=11,name3=111,name4=1111,name5=11111
name=2,name2=22,name4=2222,name5=22222
name=3,name2=33,name3=333,name4=3333,name5=33333
name=4,name2=44,name3=444,name4=4444,name5=44444
name=5,name2=55,name3=555,name4=5555,name5=55555
name=6,name3=666,name4=6666,name5=66666
name=7,name2=77,name3=777,name5=77777

I want to split this column and print values in cells of a dataframe with this tab delimited header: "name name2 name3 name4 name5" as you can see, "header" has 5 columns, whilst "name" has different numbers of comma separated values (5 in the first row, 4 in the second), and when i use the "separate()" function I have a frameshift. How can I fill missing values with a dot or something else? Thank you so much

R • 252 views
ADD COMMENTlink modified 3 months ago by GenoMax96k • written 3 months ago by giusdalt950

Thank you so much, that was literaly what I was looking for

ADD REPLYlink written 3 months ago by giusdalt950

please mark the best answer as accepted, upvote other useful answers and write any thanks into comments and not as an answer

ADD REPLYlink written 3 months ago by Ido Tamir5.1k
4
gravatar for rpolicastro
3 months ago by
rpolicastro3.9k
Bloomington, IN
rpolicastro3.9k wrote:

The data.

df <- structure(list(name = c("name=1,name2=11,name3=111,name4=1111,name5=11111", 
"name=2,name2=22,name4=2222,name5=22222", "name=3,name2=33,name3=333,name4=3333,name5=33333", 
"name=4,name2=44,name3=444,name4=4444,name5=44444", "name=5,name2=55,name3=555,name4=5555,name5=55555", 
"name=6,name3=666,name4=6666,name5=66666", "name=7,name2=77,name3=777,name5=77777"
)), class = "data.frame", row.names = c(NA, -7L))

> df
                                              name
1 name=1,name2=11,name3=111,name4=1111,name5=11111
2           name=2,name2=22,name4=2222,name5=22222
3 name=3,name2=33,name3=333,name4=3333,name5=33333
4 name=4,name2=44,name3=444,name4=4444,name5=44444
5 name=5,name2=55,name3=555,name4=5555,name5=55555
6          name=6,name3=666,name4=6666,name5=66666
7            name=7,name2=77,name3=777,name5=77777

A tidyverse answer.

library("tidyverse")

df <- df %>%
  rowid_to_column %>%
  separate_rows(name, sep=",") %>%
  separate(name, into=c("name_id", "value"), sep="=", convert=TRUE) %>%
  pivot_wider(names_from=name_id, values_from=value) %>%
  select(!rowid)

> df
# A tibble: 7 x 5
   name name2 name3 name4 name5
  <int> <int> <int> <int> <int>
1     1    11   111  1111 11111
2     2    22    NA  2222 22222
3     3    33   333  3333 33333
4     4    44   444  4444 44444
5     5    55   555  5555 55555
6     6    NA   666  6666 66666
7     7    77   777    NA 77777
ADD COMMENTlink modified 3 months ago • written 3 months ago by rpolicastro3.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 970 users visited in the last hour
_