Question

How to create a label variable from a binary matrix with multiple columns

0

Entering edit mode

2.7 years ago

biogamer.31 • 0

Hello! I am trying to get a label variable from a binary matrix (df). The df matrix has multiple columns (n=981), and I want to get a categorical variable with 3 especific labels based on all the df columns.

First, I created a small matrix and tried with the next code:

           X543.4       X543.5     X543.6      X543.7      X543.8      X543.9      X543.10     X543.11      X543.12 
ab         1            1           1           1           1           1           1           1           1
afc        0            0           0           0           0           1           0           0           1
dhaa       1            1           1           1           0           0           0           0           0
cafa       1            1           0           1           0           1           0           0           0
ashs       1            0           1           1           1           1           1           1           1
ahsa       1            1           1           1           1           1           1           1           1


my_sample_col <- data.frame(sample = rep(c("A", "B", "C"), c(2,3,4)))
row.names(my_sample_col) <- colnames(df)
my_sample_col

Output:

sample
X543.4  Bacteria A
X543.5  Bacteria A
X543.6  Bacteria B
X543.7  Bacteria B
X543.8  Bacteria B
X543.9  Bacteria C
X543.10 Bacteria C
X543.11 Bacteria C
X543.12 Bacteria C

However, these results are not the correct labels for each column of the df matrix.

Correct labels must be as below:

    sample
X543.4  Bacteria B
X543.5  Bacteria C
X543.6  Bacteria B
X543.7  Bacteria C
X543.8  Bacteria C
X543.9  Bacteria A
X543.10 Bacteria C
X543.11 Bacteria A
X543.12 Bacteria C

I was wondering if someone could help me to fix this error.

Finally, I would like to get a categorical variable with 3 especific labels based on all the df columns (n=981). I have 981 columns coded and know which code groups belong to a specific label.

Please, I would be very grateful by their awesome help.

dataframe R • 1.6k views

ADD COMMENT • link updated 2.7 years ago by Ram 43k • written 2.7 years ago by biogamer.31 • 0

0

Entering edit mode

I can see that this could have biological context - it would help if you would provide that - so we can know the context and it could help others in the future with a similar problem. That aside, on what basis are the 981 columns assigned a sample label? By what criteria are X543.4 and X543.6 assigned to C? They differ in your original matrix, but you want them to be in the same category based on what?

ADD REPLY • link 2.7 years ago by seidel 11k

0

Entering edit mode

Thank you for your response seidel. I have a binary matrix with 981 columns (named in codes). Columns represent samples obtained from 3 bacterial species. I want to get a categorical variable with 3 especific labels based on all the columns (n=981).

ADD REPLY • link 2.7 years ago by biogamer.31 • 0

1

Entering edit mode

How do you know that X543.7 is C for example? Do you have some sort of table that has this information?

ADD REPLY • link 2.7 years ago by rpolicastro 13k

0

Entering edit mode

I have no a table but I know which code belong to specific bacterial species. In the example, the next codes belong to specific label:

          sample
X543.4  Bacteria B
X543.5  Bacteria C
X543.6  Bacteria B
X543.7  Bacteria C
X543.8  Bacteria C
X543.9  Bacteria A
X543.10 Bacteria C
X543.11 Bacteria A
X543.12 Bacteria C

However, I do not know how to create a label variable similar like above from a binary matrix with multiple columns (n=981) whose codes are known to specific bacterial.

ADD REPLY • link 2.7 years ago by biogamer.31 • 0

0

Entering edit mode

Ok, so the labels do not depend on the content of the binary matrix, rather you "know which code groups belong to a specific label". And you stated this again a second time: "I have no a table but I know which code belong to specific bacterial species." So the mapping between sample ID (X543.n) and Bacteria type (A,B,C) is something in your brain (again, because you have no table). If that's the case, it seems to me the only way to get it out of your brain is to type it into a vector, in the order of the sample IDs. You could create a dataframe as follows:

data.frame(SampleID=colnames(df), 
           Bacteria=c("B","C","B","C","C","A","C","A","C", ...you keep typing
                      since there's no pattern and you just know these))

Or you could just create a named vector:

Bacteria=c("B","C","B","C", etc)
names(Bacteria) <- colnames(df)

I'm not convinced of this solution however, as I doubt you can remember a pattern of 981 bacteria types for 981 samples, and the information is not written down anywhere or in any other accessible form.

ADD REPLY • link 2.7 years ago by seidel 11k

0

Entering edit mode

Please do not paste screenshots of plain text content, it is counterproductive. You can copy paste the content directly here (using the code formatting option shown below), or use a GitHub Gist if the content volume exceeds allowed length here.

code_formatting

ADD REPLY • link 2.7 years ago by Ram 43k

0

Entering edit mode

Thank you Ram. I have already updated the question. Please, I will be very grateful by your awesome help.

ADD REPLY • link 2.7 years ago by biogamer.31 • 0

0

Entering edit mode

Did the comments from seidel help? I don't want to interfere while you're working on a solution with them.

ADD REPLY • link 2.7 years ago by Ram 43k