Question

Closed:Co-Ocurrence between Bacteria Gene Clusters

0

Entering edit mode

4.2 years ago

biohacker_tobe ▴ 80

I made a similar post a couple of weeks but I wanted some fresh input on this :)

In this case I am trying to study the co-occurrence between GC pairs in a genome, supporting it both with visualizations (heatmap and networks). Basically I'm trying to find associations (GC pairs present together) or dissociation (GC pairs which are present apart, or avoid each other).

This is where I want to support with statistics other than just visualizations, the most I was considering was with a more pairwise analysis, or revising literature I thought it would be adequate to use phi or chi squared however I'm not sure if this is the correct approach.

I had an initial presence absence table (1st table example):

GeneCluster  Genome
--------------------
GCF1           S1
GCF1           S2
GCF3           S3
GCF2           S4
GCF2           S5
GCF4           S6

I was able to convert it into a binary table (see below example) for visualizations and clustering:

      S1 S2 S3 S4 S5 S6
-----------------------------
GCF1   0  0  1  1  0  0
GCF2   0  1  0  1  1  1
GCF3   1  1  1  0  0  0
GCF4   0  0  0  0  0  1

Base on this, I wanted to do a statistical analysis for this, would it be possible to use something simple as a Chi Squared or a Pairwise analysis like Fisher's to measure the statistical significance of this?

genome clustering sequencing statistics • 70 views

ADD COMMENT • link updated 4.2 years ago by Ram 43k • written 4.2 years ago by biohacker_tobe ▴ 80