Question: Split and recombine columns with R
0
gravatar for Sissi
8 weeks ago by
Sissi50
Italy
Sissi50 wrote:

Hi guys,

I have probably an easy task but my R knowledge is not good enough. I have a column with COG-annotation categories, with some raws having multiple categories:

A
A
B|Q
B|Q
B|Q
R|P|G|E
R|P|G|E
R|P|G|E

I would like to split them (thus removing the | separator which I managed using awk) and then concatenated all the (here) 4 columns in only one, so I can count the total frequency of each category. I said R just because I'm going to make a plot afterwards, but also awk or similar are very welcome. Thanks a lot, S

ADD COMMENTlink modified 8 weeks ago by bioinformatics2020570 • written 8 weeks ago by Sissi50

Can you post what the data.frame looks like currently?

ADD REPLYlink written 8 weeks ago by bioinformatics2020570

Oh sorry. I edited the previous post. That's one column (called COG_CATEGORY) of a CSV file with many more columns and thousands of raws, I copied just few to give an idea. And that's what I would like:

A
A
B
B
B
R
R
R
Q
Q
Q
P
P
P
G
G
G
E
E
E

Ti finally have:

Category Frequency
A   2
B   3
R   3 
..
ADD REPLYlink written 8 weeks ago by Sissi50

Input:

$ cat test.txt 
A
A
B|Q
B|Q
B|Q
R|P|G|E
R|P|G|E
R|P|G|E

output:

$  tr -s '|' '\n'  < test.txt | sort -k1 | uniq -c

      2 A
      3 B
      3 E
      3 G
      3 P
      3 Q
      3 R
ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by cpad011214k

That's awesome! I should learn/use more often those three commands, thanks a lot!

ADD REPLYlink written 8 weeks ago by Sissi50
2
gravatar for bioinformatics2020
8 weeks ago by
bioinformatics2020570 wrote:
if(!require("tidyverse")) install.packages("tidyverse")
library(tidyverse)
x <- read_csv("data.csv")
x <- str_replace_all(x$COG_CATEGORY, pattern = "\\|", replacement = " ")
x <- str_split(x, " ")
x <- purrr::reduce(x, c)
table(x)
ADD COMMENTlink written 8 weeks ago by bioinformatics2020570
1

If you are using tidyverse you could also do df %>% separate_rows(COG_CATEGORY, sep="\\|") %>% count(COG_CATEGORY)

ADD REPLYlink written 7 weeks ago by rpolicastro3.2k

That's perfect!! What is the purrr::reduce doing??

Thank you very much guys!

ADD REPLYlink written 8 weeks ago by Sissi50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1750 users visited in the last hour
_