Split and recombine columns with R
2
0
Entering edit mode
18 months ago
Sissi ▴ 60

Hi guys,

I have probably an easy task but my R knowledge is not good enough. I have a column with COG-annotation categories, with some raws having multiple categories:

A
A
B|Q
B|Q
B|Q
R|P|G|E
R|P|G|E
R|P|G|E

I would like to split them (thus removing the | separator which I managed using awk) and then concatenated all the (here) 4 columns in only one, so I can count the total frequency of each category. I said R just because I'm going to make a plot afterwards, but also awk or similar are very welcome. Thanks a lot, S

R COG COG annotation annotation awk • 574 views
ADD COMMENT
0
Entering edit mode

Can you post what the data.frame looks like currently?

ADD REPLY
0
Entering edit mode

Oh sorry. I edited the previous post. That's one column (called COG_CATEGORY) of a CSV file with many more columns and thousands of raws, I copied just few to give an idea. And that's what I would like:

A
A
B
B
B
R
R
R
Q
Q
Q
P
P
P
G
G
G
E
E
E

Ti finally have:

Category Frequency
A   2
B   3
R   3 
..
ADD REPLY
0
Entering edit mode

Input:

$ cat test.txt 
A
A
B|Q
B|Q
B|Q
R|P|G|E
R|P|G|E
R|P|G|E

output:

$  tr -s '|' '\n'  < test.txt | sort -k1 | uniq -c

      2 A
      3 B
      3 E
      3 G
      3 P
      3 Q
      3 R
ADD REPLY
0
Entering edit mode

That's awesome! I should learn/use more often those three commands, thanks a lot!

ADD REPLY
2
Entering edit mode
18 months ago
if(!require("tidyverse")) install.packages("tidyverse")
library(tidyverse)
x <- read_csv("data.csv")
x <- str_replace_all(x$COG_CATEGORY, pattern = "\\|", replacement = " ")
x <- str_split(x, " ")
x <- purrr::reduce(x, c)
table(x)
ADD COMMENT
1
Entering edit mode

If you are using tidyverse you could also do df %>% separate_rows(COG_CATEGORY, sep="\\|") %>% count(COG_CATEGORY)

ADD REPLY
0
Entering edit mode

That's perfect!! What is the purrr::reduce doing??

Thank you very much guys!

ADD REPLY

Login before adding your answer.

Traffic: 852 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6