I have a set of wrapper scripts around a Pearson r correlation Rscript, which is called on
N(N-1) pairwise combinations of files, for each chromosome (or all chromosomes), with jobs scheduled on a Slurm-based cluster.
The main script is called
makefile showing how it is used:
--by-chr option does pairwise comparisons on a per-chromosome basis. This can be replaced with
--all to run a distance or similarity metric on whole-genome signal.
It goes without saying that the signal in pairs of your input files should be binned identically, so that correlation is run on vectors of equal length.
If you don't want Pearson r, you could edit line 117 of https://resources.altius.org/~areynolds/public/cor/src/chromCor2 to replace
cor() with another distance or similarity metric function call.
The call to
pearson.py relies on a metadata table file, containing label and path information in the second and fourth columns.
Here's an example of what that file looks like:
The second column should contain a unique label for the file specified in the fourth column.
We use Starch files to save disk space, but you can specify gzipped files in the fourth column.
If you use gzip files, you must:
Modify lines 54-80 in
pearson.py to no longer test if the input files are genuine Starch archives.
Modify the call to
-s with the
-z option, in order to specify gzipped input, instead of Starch input.
Once you run things and have score results, a summary file called
scores.mtx is created in the
For visualization, I have a script called
mtx2json.py (https://resources.altius.org/~areynolds/public/cor/src/mtx2json.py) that converts a
scores.mtx file to a
You can call this by hand, e.g.:
$ ./mtx2json.py < /path/to/scores.mtx > /path/to/scores.json
This JSON file can be brought into my checkerboard tool: https://tools.altiusinstitute.org/checkerboard/
In the checkerboard tool, you paste in a JSON file of pairwise comparison scores, import it, select the desired categories ("labels" from the second column of the metadata table), and render the selection to an SVG-formatted figure.
You could also take the
scores.mtx file and use that directly with any number of custom visualization scripts. The
mtx2json.py step is for convenience.