Split and recombine columns with R
2
0
Entering edit mode
18 months ago
Sissi ▴ 60

Hi guys,

I have probably an easy task but my R knowledge is not good enough. I have a column with COG-annotation categories, with some raws having multiple categories:

A
A
B|Q
B|Q
B|Q
R|P|G|E
R|P|G|E
R|P|G|E


I would like to split them (thus removing the | separator which I managed using awk) and then concatenated all the (here) 4 columns in only one, so I can count the total frequency of each category. I said R just because I'm going to make a plot afterwards, but also awk or similar are very welcome. Thanks a lot, S

R COG COG annotation annotation awk • 574 views
0
Entering edit mode

Can you post what the data.frame looks like currently?

0
Entering edit mode

Oh sorry. I edited the previous post. That's one column (called COG_CATEGORY) of a CSV file with many more columns and thousands of raws, I copied just few to give an idea. And that's what I would like:

A
A
B
B
B
R
R
R
Q
Q
Q
P
P
P
G
G
G
E
E
E


Ti finally have:

Category Frequency
A   2
B   3
R   3
..

0
Entering edit mode

Input:

$cat test.txt A A B|Q B|Q B|Q R|P|G|E R|P|G|E R|P|G|E  output: $  tr -s '|' '\n'  < test.txt | sort -k1 | uniq -c

2 A
3 B
3 E
3 G
3 P
3 Q
3 R

0
Entering edit mode

That's awesome! I should learn/use more often those three commands, thanks a lot!

2
Entering edit mode
18 months ago
if(!require("tidyverse")) install.packages("tidyverse")
library(tidyverse)
x <- str_replace_all(x\$COG_CATEGORY, pattern = "\\|", replacement = " ")
x <- str_split(x, " ")
x <- purrr::reduce(x, c)
table(x)

1
Entering edit mode

If you are using tidyverse you could also do df %>% separate_rows(COG_CATEGORY, sep="\\|") %>% count(COG_CATEGORY)

0
Entering edit mode

That's perfect!! What is the purrr::reduce doing??

Thank you very much guys!