Question

expanding a table by duplicating rows of a specific pattern

0

Entering edit mode

18 months ago

Assa Yeroslaviz ★ 1.8k

I have a huge matrix with intensity values. In the first column some of the identifier have multiple entries, separated by a ;. I would like to expand these rows and duplicate them into separate rows, as shown in the example below.

this is my input data frame:

Names   100 101 102 103 104
gene1   21.07819176 23.59656334 22.82142258 22.45487976 22.58430672
gene2   22.06554222 22.30852509 22.00048065 19.69557953 22.39509964
gene3;gene13    28.92616272 27.95604324 27.27648544 28.94528198 28.76357269
gene4   26.13991928 27.39280319 26.18173409 26.68411636 26.81592178
gene5   18.65508842 25.58026314 24.83395576 24.77213287 23.68795204
gene6;gene9 23.71098518 24.2517643  23.48701668 24.06730843 24.05034447
gene7   24.33637047 23.93660927 25.7705574  22.52026558 23.99619293
gene8   29.45690346 28.30786705 29.16029167 28.49213028 28.90692902
gene10  22.11131859 22.68044853 21.56625557 23.16993523 22.79750633
gene12  21.12464714 22.55580902 19.1800499  21.61175728 21.85718918
gene11;gene17;gene22    18.44805717 21.8158741  22.70597649 21.2684803  23.24526405

The rows with two or more entries i would like to duplicate and put each of these identifier in a separate row like this:

Names   100 101 102 103 104
gene1   21.07819176 23.59656334 22.82142258 22.45487976 22.58430672
gene2   22.06554222 22.30852509 22.00048065 19.69557953 22.39509964
gene3   28.92616272 27.95604324 27.27648544 28.94528198 28.76357269
gene13  28.92616272 27.95604324 27.27648544 28.94528198 28.76357269
gene4   26.13991928 27.39280319 26.18173409 26.68411636 26.81592178
gene5   18.65508842 25.58026314 24.83395576 24.77213287 23.68795204
gene6   23.71098518 24.2517643  23.48701668 24.06730843 24.05034447
gene9   23.71098518 24.2517643  23.48701668 24.06730843 24.05034447
gene7   24.33637047 23.93660927 25.7705574  22.52026558 23.99619293
gene8   29.45690346 28.30786705 29.16029167 28.49213028 28.90692902
gene10  22.11131859 22.68044853 21.56625557 23.16993523 22.79750633
gene12  21.12464714 22.55580902 19.1800499  21.61175728 21.85718918
gene11  18.44805717 21.8158741  22.70597649 21.2684803  23.24526405
gene17  18.44805717 21.8158741  22.70597649 21.2684803  23.24526405
gene22  18.44805717 21.8158741  22.70597649 21.2684803  23.24526405

Is there a way to do it using the tidyverse (dplyr) packages?

thanks Assa

df <- structure(list(Names = c("gene1", "gene2", "gene3;gene13", "gene4", 
"gene5", "gene6;gene9", "gene7", "gene8", "gene10", "gene12", 
"gene11;gene17;gene22"), X100 = c("21.07819176", "22.06554222", 
"28.92616272", "26.13991928", "18.65508842", "23.71098518", "24.33637047", 
"29.45690346", "22.11131859", "21.12464714", "18.44805717"), 
    X101 = c("23.59656334", "22.30852509", "27.95604324", "27.39280319", 
    "25.58026314", "24.2517643", "23.93660927", "28.30786705", 
    "22.68044853", "22.55580902", "21.8158741"), X102 = c("22.82142258", 
    "22.00048065", "27.27648544", "26.18173409", "24.83395576", 
    "23.48701668", "25.7705574", "29.16029167", "21.56625557", 
    "19.1800499", "22.70597649"), X103 = c("22.45487976", "19.69557953", 
    "28.94528198", "26.68411636", "24.77213287", "24.06730843", 
    "22.52026558", "28.49213028", "23.16993523", "21.61175728", 
    "21.2684803"), X104 = c("22.58430672", "22.39509964", "28.76357269", 
    "26.81592178", "23.68795204", "24.05034447", "23.99619293", 
    "28.90692902", "22.79750633", "21.85718918", "23.24526405"
    )), class = "data.frame", row.names = c(NA, -11L))

tidyyverse dplyr R data.frame • 543 views

ADD COMMENT • link 18 months ago by Assa Yeroslaviz ★ 1.8k

score 3 · Accepted Answer · 2022-10-07

3

Entering edit mode

18 months ago

Basti ★ 2.0k

You can use separate_rows(df,Names)

ADD COMMENT • link 18 months ago by Basti ★ 2.0k

0

Entering edit mode

Thanks, this does the work. a very easy and efficient solution.

ADD REPLY • link 18 months ago by Assa Yeroslaviz ★ 1.8k