Converting this matrix to a binary matrix
1
0
Entering edit mode
4.7 years ago
zizigolu ★ 4.3k

Hi

I have this matrix, one column GO term, one column genes enriched for that term and fold gene of that gene

GO_term      Gene_Name  Log2FC
cell adhesion   IGFBP7  1.38
cell adhesion   PVRL4   -1.40
cell adhesion   NCAM1   -1.35
cell-matrix adhesion    ITGA7   -1.20
cell-matrix adhesion    ITGA4   0.75
positive regulation of cell migration   ITGA5   -1.36
positive regulation of cell migration   RRAS2   -0.59
cellular oxidant detoxification FABP1   2.35
cellular oxidant detoxification LTC4S   -0.59
muscle contraction  ACTA2   -1.21
muscle contraction  VCL -1.06

How I can convert my matrix to something like this

> head(chord)
      heart development phosphorylation vasculature development
PTK2                  0               1                       1
GNA13                 0               0                       1
LEPR                  0               0                       1
APOE                  0               0                       1
CXCR4                 0               0                       1
RECK                  0               0                       1
      blood vessel development tissue morphogenesis cell adhesion
PTK2                         1                    0             0
GNA13                        1                    0             0
LEPR                         1                    0             0
APOE                         1                    0             0
CXCR4                        1                    0             0
RECK                         1                    0             0
      plasma membrane      logFC
PTK2                1 -0.6527904
GNA13               1  0.3711599
LEPR                1  2.6539788
APOE                1  0.8698346
CXCR4               1 -2.5647537
RECK                1  3.6926860
>

A binary matrix for genes in each term with corresponding logfFC

r • 1.5k views
ADD COMMENT
0
Entering edit mode

Basically, it's a pivot table transformation. Remove 'logFC' column- pivot table- add 'logFC' column back.

ADD REPLY
0
Entering edit mode
4.7 years ago
fracarb8 ★ 1.6k

You can convert a long table to a wide one with dcast

library(data.table)
dcast(yourDF,Gene_Name~GO_term , value.var = "Log2FC",fun.aggregate = mean)

You may want to change the aggregate function depending on how you want to handle duplicated values.

ADD COMMENT
0
Entering edit mode

Thank you

But logFC should be in separated column while I finished with

> head(a)
  Gene.Name cell-cell adhesion cell-matrix adhesion cell adhesion
1      AASS                NaN                  NaN           NaN
2     ABCC4                NaN                  NaN           NaN
3    ABI3BP                NaN                  NaN           NaN
4    ABLIM1                NaN                  NaN           NaN
5    ABLIM3                NaN                  NaN           NaN
6     ACTA1                NaN                  NaN           NaN

By dcast(GOplot,Gene.Name~GO.term )

I finished with

> head(a)
  Gene.Name cell-cell adhesion cell-matrix adhesion cell adhesion
1      AASS                  0                    0             0
2     ABCC4                  0                    0             0
3    ABI3BP                  0                    0             0
4    ABLIM1                  0                    0             0
5    ABLIM3                  0                    0             0
6     ACTA1                  0                    0             0

But I don't know how to relate logFC with each gene in each term

ADD REPLY
0
Entering edit mode

dcast fills the table with what is passed by value.var =. The data specified in your question has only 3 columns, so the only value that can be used to populate the table is log2FC. Which value you want the table to be filled? what are the 0-1 values in your last table?

ADD REPLY
0
Entering edit mode

1 means if a gene exists in a GO term and 0 mean this gene does not exist in that GO term

ADD REPLY
0
Entering edit mode

The problem with that is that you won't be able to know which GO term that log2FC is associated to. If you look at your example, PTK2 has multiple 1s, but only a single log2FC. With dcast you have NaN if the gene is not enriched and a logFC value (specific for that GO term) if it is.

ADD REPLY

Login before adding your answer.

Traffic: 3615 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6