I have not done something like that, but using deepTools this may work (assuming you have bigwigs for the sense and antisense):
get a a list of exon genomic positions, be sure that they contain the orientation.
For a figure similar to A, use computeMatrix reference-point, select as reference point the end of the region and 150 bp before and 150bp after the reference point.
For a figure similar to B do the same, except that you choose as reference point the start of the region.
This is actually similar to what I had planned to do, I am just working on generating the exon positions with orientation. I'll post back here to let you know how it goes.
Generate positions of a set of junctions (or other features of interest: TF binding sites, etc.) and separating into strand-specific subsets, if necessary
Calculate windows around feature positions with bedops --range or similar
"Melt" the window intervals into per-base elements, using Perl or Python etc.
Score the per-base elements against signal files (here, normalized reads) with bedmap --echo-map-score or similar, replacing NAs with zeroes, if appropriate, and applying any strand-specific signal reorientation or shift adjustments, if needed
Merge per-base element scores into a matrix file, one row per windowed feature, using scripting language of choice
Once I have this matrix, I can plot in R:
Import matrix into R variable
Calculate column means with colMeans() (or calculate other columnar statistic: medians, etc. using apply())
Create empty plot with plot()
Use polygon() to generate shaded region around bases of interest
Render axes with axis()
Draw columnar means (or whatever statistic) with lines()
Once the overall process is set up, these steps are pulled into a makefile for pipelining and putting on a cluster.
If you have a specific question about any part of this, feel free to follow up and I can see how much detail I can provide.
Generating windows around BED elements
To generate flanking regions around elements, I use bedops --range with some chosen size value for flanking regions. For instance, to generate a 150 bp window (300 bp in total) around a set of single-base junctions:
This operation preserves all columns, including ID and strand data. I can then melt this padded file into per-base elements, score the per-base elements, and build a matrix for statistical analysis or heatmaps, etc.
Melting a BED file
To "melt" a BED6 (six-column BED) file into per-base elements, I use the following script:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This "melts" the BED file into single-base bins. You could modify this script to generate any bin size you like, but you'd want to think about the edge case where an interval is not divided evenly by a chosen bin size, and how that might affect calculating statistics.
The ID field is changed. It becomes originalID!index, where originalID is the ID field of the original element, and index is a unique integer value that identifies and groups the per-base elements.
I use ! as a delimiter that is unlikely to be found as a character in the names of genes or other elements I am melting. The idea here is that I can group per-base elements after scoring with bedmap, generate my matrix file, and recover the original ID, if needed.
However, this will strip fourth and subsequent columns, removing ID and strand information. I find it useful - for the purposes of this specific pipeline, for making these types of plots - to create a custom ID that groups per-base elements made from the original element, which is easier to do with a scripted approach.
I might be bothering you for a bit heh. In step 3, when you say "melt" do you mean divide into bins? If so, how can this be done using bedops? (I'm attempting to learn the tool)
Using a pre-obtained sample of exon-intron junctions I was able to plot this successfully. My question now is how would I generate my own exon-intron junction database using my available data? I have access to all ChIP-Seq files that are obtained from MACS2 peak calling as well as bigWig files, bam files, sam files etc. I am working with hg19 human genome.
This is actually similar to what I had planned to do, I am just working on generating the exon positions with orientation. I'll post back here to let you know how it goes.