Question: Correlation Plot for bed files
0
gravatar for christoph.neu
5 weeks ago by
christoph.neu0 wrote:

Hey,

I am looking for a tool, which calculates a correlation coefficient for a set of bed files. Preferably it would also provide a visualization for it.

Similarly to what the combination of "multiBamSummary" and "plotCorrelation" from deeptool does for bam files.

e.g. between all bedfiles from this roadmap folder:

wget ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/roadmapepigenomics/by_sample/H1_cell_line/H3K4me3/*.bed.gz"

I could use

bedtools jaccard

but i thought their might already be a tool which deals with the looping and visualization.

visualization bed • 154 views
ADD COMMENTlink modified 5 weeks ago by A. Domingues2.2k • written 5 weeks ago by christoph.neu0

Can you provide example data?

ADD REPLYlink written 5 weeks ago by ATpoint36k
3
gravatar for A. Domingues
5 weeks ago by
A. Domingues2.2k
Dresden, Germany
A. Domingues2.2k wrote:

I think intervene might do the job. Look for the options in the module intervene pairwise: https://intervene.readthedocs.io/en/latest/examples.html#pairwise-module-examples

Intervene provides an easy and automated interface for effective intersection and visualization of genomic region sets or lists of items, thus facilitating their analysis and interpretations. Intervene contains three modules.

   -  venn to compute Venn diagrams of up-to 6 sets
    - upset to compute UpSet plots of multiple sets
    - pairwise to compute and visualize intersections of genomic sets as clustered heatmap.
  

https://intervene.readthedocs.io/

Intervene’s pairwise module provides several metrics to assess intersections, including number of overlaps, fraction of overlap, Jaccard statistics, Fisher’s exact test, and distribution of relative distances. Moreover, the user can choose from different styles of heat maps and clustering approaches. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1708-7

It is also available via a website so you can play with it before installing: https://asntech.shinyapps.io/intervene/

I am not in any way associated with this tool, I just find it very useful.

ADD COMMENTlink written 5 weeks ago by A. Domingues2.2k

works great, thanks a lot

ADD REPLYlink written 5 weeks ago by christoph.neu0
0
gravatar for Alex Reynolds
5 weeks ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

I have a set of wrapper scripts around a Pearson r correlation Rscript, which is called on N(N-1) pairwise combinations of files, for each chromosome (or all chromosomes), with jobs scheduled on a Slurm-based cluster.

The main script is called pearson.py:

https://resources.altius.org/~areynolds/public/cor/src/pearson.py

Here's the makefile showing how it is used:

https://resources.altius.org/~areynolds/public/cor/src/makefile

The --by-chr option does pairwise comparisons on a per-chromosome basis. This can be replaced with --all to run a distance or similarity metric on whole-genome signal.

It goes without saying that the signal in pairs of your input files should be binned identically, so that correlation is run on vectors of equal length.

If you don't want Pearson r, you could edit line 117 of https://resources.altius.org/~areynolds/public/cor/src/chromCor2 to replace cor() with another distance or similarity metric function call.

The call to pearson.py relies on a metadata table file, containing label and path information in the second and fourth columns.

Here's an example of what that file looks like:

https://resources.altius.org/~areynolds/public/cor/data/set1.table.txt

The second column should contain a unique label for the file specified in the fourth column.

We use Starch files to save disk space, but you can specify gzipped files in the fourth column.

If you use gzip files, you must:

  1. Modify lines 54-80 in pearson.py to no longer test if the input files are genuine Starch archives.

  2. Modify the call to chromCor2 in pearson_slurm.sh, replacing -s with the -z option, in order to specify gzipped input, instead of Starch input.

Once you run things and have score results, a summary file called scores.mtx is created in the ../results directory.

For visualization, I have a script called mtx2json.py (https://resources.altius.org/~areynolds/public/cor/src/mtx2json.py) that converts a scores.mtx file to a scores.json file.

You can call this by hand, e.g.:

$ ./mtx2json.py < /path/to/scores.mtx > /path/to/scores.json

This JSON file can be brought into my checkerboard tool: https://tools.altiusinstitute.org/checkerboard/

In the checkerboard tool, you paste in a JSON file of pairwise comparison scores, import it, select the desired categories ("labels" from the second column of the metadata table), and render the selection to an SVG-formatted figure.

You could also take the scores.mtx file and use that directly with any number of custom visualization scripts. The mtx2json.py step is for convenience.

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Alex Reynolds30k

thanks for the answer. However, I don't have a slurm cluster and running chromCor2 directly on starch or bed files gives me a dimension error for rds

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by christoph.neu0

Signal vectors need to be of equal length for correlation to work. Maybe the other solution can preprocess files to make them fit, but if you run into a similar error there, you might look at a pair of files giving problems.

ADD REPLYlink written 5 weeks ago by Alex Reynolds30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1047 users visited in the last hour