Co-Occurrence Network Graph & Statistics
1
1
Entering edit mode
2.4 years ago

I am trying to make a co-occurrence network graph for my presence/absence data of genes per genomes but am unsure how to go about with it. I'm hoping to end up with something like the first image below,

Where each gene is linked to another gene , considering if they are both present in the same genomes, where possibly a larger circle being used to describe a higher frequency gene. I originally tried using widyr and tidygraph packages but I am unsure that my data is not compatible (see second image), as it has the BGCs as rows and the individual genomes as columns. I am examining the presence/absence pattern of the gene pair to determine if they represent a coincident relationship; basically if gene i and gene j are observed together or apart in the input genomes more often than would be expected by chance.

1) Are there any suggestions on what packages/code I could use that would work with my data set, or how I could adapt my data set to work with these packages?

2) Are there any statistical tests that would be also recommended specifically to assure that there is a coincident or not type relationship?

# Example of data set
# rows = genes
# cols = genomes
set.seed(2222)
df <- matrix(sample(c(TRUE, FALSE), 50, replace = TRUE), 5)
colnames(df) <- letters[1:10]


R networks dataframe • 4.2k views
4
Entering edit mode
2.4 years ago

To address question 1, I would suggest to use the R igraph package. There's an excellent tutorial here. Starting from a binary matrix A that can be considered as the adjacency matrix of the graph, you can do something like:

library(igraph)
plot(G)


Here you have a bipartite graph and your matrix is not square so it is not an adjacency matrix but can be considered an incidence matrix. You can expand it to a full adjacency matrix and use the above or you can do:

G <- graph_from_incidence_matrix(A)


Then you just need to style the graph to your liking.

EDIT: Re-reading the question, I see you mean co-occurrence in question 2. There are a number of R packages from different fields that can do co-occurrence analysis from binary matrices such as: EcoSimR from ecology (see the co-occurrence analysis vignette) or quanteda from text analysis (tutorial).

1
Entering edit mode

I have found quite useful the following package in R, for which I have been able to adapt my data based upon this:

The "CoOccur" R Package https://cran.r-project.org/web/packages/cooccur/index.html

The algorithm calculates the observed and expected frequencies of co-occurrence between each pair of species, which in my case the species will refer to the GCFs

Thanks once again for your help, thought I could leave this here to share for other users.

0
Entering edit mode

As you have mentioned my binary matrix is non-square matrix, is it possible to change this to square matrix? The first method provided gave me an error when trying to run:

Error in graph.adjacency.dense(adjmatrix, mode = mode, weighted = weighted,  :
At structure_generators.c:274 : Non-square matrix, Non-square matrix


With the second line of code provided for incidence matrix, this gave no issues at all, just a small warning message.

For question 2 of my post do you think it's possible then to make to obtain something as a P-Value statistic. This value representing an association factor between GCFs, in this case.

I will definitely revise the tutorials provided for the statistical analysis, however is it possible to conduct tests of these kind on non-squared matrices or would it be necessary to convert these to a square? In this case if converted, they will most likely coerce NAs values, can these be converted to some other value?