Hello, I am currently working with shotgun metagenomic sequencing data, and I aim to construct a co-occurrence network.
I used SparCC (Sparse Correlations for Compositional Data) to calculate the co-occurrence between microbial species.
Upon literature surveys, I can see that |r| > 0.2 & FDR < 0.05 is a common threshold to filter out spurious/weak correlations. In my data , all of my correlations coefficients were under 0.1, with which would be difficult to construct a network.
However, I found that SparCC takes absolute count table as a input rather than relative abundance table. It seems that using relative abundance table for correlation calculation was the problem with low coefficient, since it got better when I used the absolute count table.
I wonder what causes that problem. I understood that SparCC uses log-transformed abundance to capture linear pearson correlation. SparCC assumes log(xi/xj) = log(wi/wj) where x is the relative abundance and w is the absolute abundance. Then, wouldn't the results from two type of tables should be the same?
*SparCC reference :
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002687
https://github.com/dlegor/SparCC