Question: Dividing Taxonomy Table in R
gravatar for
3.7 years ago by
fionnuala.mm0 wrote:


I'm trying to divide a taxonomy table, which has a few hundred rows and which looks something like this:


I'm trying to make a new column for each name (e.g., "d", "p", "c" etc as column names), and have Bacteria, firmicutes, clostridia etc in their respective column, with the associated values in brackets retained,

I've tried using a variety of methods:

colsplit(taxa, split=",", names) (where names is a vector of the col names I want and taxa is the input data frame) 
split <- strsplit(as.character(taxa, ",", fixed=TRUE)

and a variety of other split methods, but it keeps returning errors like "argument "split" is missing, with no default".

Any suggestions on how I might achieve this?

Thanks for any help!

taxonomy-table R • 1.9k views
ADD COMMENTlink modified 3.7 years ago by Chris S.290 • written 3.7 years ago by fionnuala.mm0
gravatar for Erik Wright
3.7 years ago by
Erik Wright360
Erik Wright360 wrote:

It looks like you have a parentheses problem. Try this:

split <- strsplit(as.character(taxa), ",", fixed=TRUE)
ADD COMMENTlink written 3.7 years ago by Erik Wright360
gravatar for Brice Sarver
3.7 years ago by
Brice Sarver3.5k
United States
Brice Sarver3.5k wrote:

You're not really that clear at all with what you want as the end result, but this ought to get you 95% of the way there and then you can tweak it yourself.

a <- "d:Bacteria(96.3),p:Firmicutes(70.8),c:Clostridia(69.2),o:Clostridiales(69.2),f:Lachnospiraceae(63.3),g:Roseburia(48.4)"

b <- sapply(strsplit(a, ","), "[")

colnames <- sapply(strsplit(b, ":"), "[[", 1L)

vals <- sapply(strsplit(b, ":"), "[[", 2L)

data.frame(t(vals), stringsAsFactors=FALSE)

colnames(final) <- colnames

> final (I've truncated the results because the formatting on Biostars can be tricky for generating tables this way)

               d                p                c                   o 

1 Bacteria(96.3) Firmicutes(70.8) Clostridia(69.2) Clostridiales(69.2)
ADD COMMENTlink written 3.7 years ago by Brice Sarver3.5k
gravatar for Chris S.
3.7 years ago by
Chris S.290
United States
Chris S.290 wrote:

If you have a table, try using the tidyr package

x <- read.csv(text='id,taxa

x %>% separate(taxa, c("domain", "phylum", "class", "order", "family", "genus"), ",[a-z]:")

  id           domain           phylum            class               order                family           genus
1  1 d:Bacteria(96.3) Firmicutes(70.8) Clostridia(69.2) Clostridiales(69.2) Lachnospiraceae(63.3) Roseburia(48.4)
2  2 d:Bacteria(93.3) Firmicutes(60.8)    Bacilli(59.2)    Bacillales(59.2)     Bacillaceae(53.3)  Bacillus(38.4)
ADD COMMENTlink written 3.7 years ago by Chris S.290
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1276 users visited in the last hour