Re-creating a published figure

1

Entering edit mode

9.6 years ago

Bioradical ▴ 60

I am attempting to re-create this figure from this link http://figshare.com/articles/_Enrichment_of_RNA_polII_on_the_exon_sides_of_exon_intron_and_intron_exon_junctions_/372514 (Figure A and B, the junction figures). I currently use a combination of ngsplot and deepTools for most of my metagene plots, but I have no idea how I would be able to plot something like this.

Any ideas?

ChIP-Seq Metagene • 3.1k views

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 9.6 years ago by Bioradical ▴ 60

2

Entering edit mode

9.6 years ago

Fidel ★ 2.0k

I have not done something like that, but using deepTools this may work (assuming you have bigwigs for the sense and antisense):

get a a list of exon genomic positions, be sure that they contain the orientation.
For a figure similar to A, use computeMatrix reference-point, select as reference point the end of the region and 150 bp before and 150bp after the reference point.
For a figure similar to B do the same, except that you choose as reference point the start of the region.

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 9.6 years ago by Fidel ★ 2.0k

0

Entering edit mode

This is actually similar to what I had planned to do, I am just working on generating the exon positions with orientation. I'll post back here to let you know how it goes.

ADD REPLY • link 9.6 years ago by Bioradical ▴ 60

2

Entering edit mode

9.6 years ago

Alex Reynolds 36k

The way I have made these kinds of plots is:

Generate positions of a set of junctions (or other features of interest: TF binding sites, etc.) and separating into strand-specific subsets, if necessary
Calculate windows around feature positions with bedops --range or similar
"Melt" the window intervals into per-base elements, using Perl or Python etc.
Score the per-base elements against signal files (here, normalized reads) with bedmap --echo-map-score or similar, replacing NAs with zeroes, if appropriate, and applying any strand-specific signal reorientation or shift adjustments, if needed
Merge per-base element scores into a matrix file, one row per windowed feature, using scripting language of choice

Once I have this matrix, I can plot in R:

Import matrix into R variable
Calculate column means with colMeans() (or calculate other columnar statistic: medians, etc. using apply())
Create empty plot with plot()
Use polygon() to generate shaded region around bases of interest
Render axes with axis()
Draw columnar means (or whatever statistic) with lines()

Once the overall process is set up, these steps are pulled into a makefile for pipelining and putting on a cluster.

If you have a specific question about any part of this, feel free to follow up and I can see how much detail I can provide.

Generating windows around BED elements

To generate flanking regions around elements, I use bedops --range with some chosen size value for flanking regions. For instance, to generate a 150 bp window (300 bp in total) around a set of single-base junctions:

$ bedops --everything --range 150 junctions.bed > 150bp_padded_junctions.bed

This operation preserves all columns, including ID and strand data. I can then melt this padded file into per-base elements, score the per-base elements, and build a matrix for statistical analysis or heatmaps, etc.

Melting a BED file

To "melt" a BED6 (six-column BED) file into per-base elements, I use the following script:

	#!/usr/bin/env perl

	use strict;
	use warnings;

	my $cnt = 0;
	while (<STDIN>) {
	chomp;
	my ($chr, $start, $stop, $id, $score, $strand) = split("\t", $_);
	for (my $meltIdx = $start; $meltIdx < $stop; $meltIdx += 1) {
	print STDOUT join("\t", ($chr, $meltIdx, $meltIdx + 1, $id."!$cnt", $score, $strand))."\n";
	}
	$cnt++;
	}

view raw melt.pl hosted with ❤ by GitHub

To use it:

$ melt.pl < elements.bed > per-base.elements.bed

This "melts" the BED file into single-base bins. You could modify this script to generate any bin size you like, but you'd want to think about the edge case where an interval is not divided evenly by a chosen bin size, and how that might affect calculating statistics.

The ID field is changed. It becomes originalID!index, where originalID is the ID field of the original element, and index is a unique integer value that identifies and groups the per-base elements.

I use ! as a delimiter that is unlikely to be found as a character in the names of genes or other elements I am melting. The idea here is that I can group per-base elements after scoring with bedmap, generate my matrix file, and recover the original ID, if needed.

You can also use bedops --chop here:

$ bedops --chop 1 elements.bed > per-base.elements.bed

However, this will strip fourth and subsequent columns, removing ID and strand information. I find it useful - for the purposes of this specific pipeline, for making these types of plots - to create a custom ID that groups per-base elements made from the original element, which is easier to do with a scripted approach.

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 9.6 years ago by Alex Reynolds 36k

0

Entering edit mode

I might be bothering you for a bit heh. In step 3, when you say "melt" do you mean divide into bins? If so, how can this be done using bedops? (I'm attempting to learn the tool)

ADD REPLY • link 9.6 years ago by Bioradical ▴ 60

0

Entering edit mode

Please see edit.

ADD REPLY • link 9.6 years ago by Alex Reynolds 36k

0

Entering edit mode

Using a pre-obtained sample of exon-intron junctions I was able to plot this successfully. My question now is how would I generate my own exon-intron junction database using my available data? I have access to all ChIP-Seq files that are obtained from MACS2 peak calling as well as bigWig files, bam files, sam files etc. I am working with hg19 human genome.

ADD REPLY • link 9.6 years ago by Bioradical ▴ 60

Login before adding your answer.