One simplistic approach could be to bedmap
per-base signal across the interval, binning it every 50 bp regardless of whether the bin came from one or more subintervals. This treats your interval as contiguous, even if it isn't.
First, "melt" your 550bp interval into per-base elements:
$ awk '{ \
chr = $1; \
start = $2; \
stop = $3; \
for (binStart = start; binStart < stop; binStart++) { \
print chr"\t"binStart"\t"(binStart + 1)"\tidx-"NR; \
} \
}' interval.bed > per_base.bed
Then map per-base tag counts (signal) to bases (this assumes you have a per-base tag "pile-up" file called tags.bed
):
$ bedmap --echo --echo-map-score --delim '\t' per_base.bed tags.bed > per_base_tags.bed
This gives you a mapping of a base to the number of tags at that base. If there are no tags, there is no score value (assume a score of 0 tags at that base, or post-process with awk
to put in a zero).
Now sum the score column from this file, 50 contiguous rows at a time, to get the number of tags over each of your 50 bp bins that span the interval.
This demo only works with one interval. In the real world, you will need to generalize this process to deal with multiple intervals that may overlap.
One way to do this is to add a unique per-interval ID to each subinterval in interval.bed
and modify the awk
script to print this ID to the per-base output. You can then use UNIX sort
on the ID column to group your per-base bedmap
results by the original interval they came from. When you loop over the ID-sorted results, you can grab 50 contiguous rows at a time and sum up the tags to get your 11 bins.
Thank you very much for your useful reply. I tried to make per-base tag "pile-up" as follow, but I think it is not correct, would you please have a look at my command:
But for the first interval it returns
Honestly, I'm not too familiar with using
samtools
to calculate pile-up tracks. Definitely seems like a good question to ask the site, though!Do you an easy way how I could make a tabular matrix (with 1 column and 20 rows) using bashSripts, instead of perl/R that i use to make it now.
RequiredOutputFormat