If you're amenable to alternatives, you could use BEDOPS closest-features
on strand-separated genes.
For instance, sort the genes and repeats, if not sorted:
$ sort-bed genes.bed > sorted-genes.bed
$ sort-bed repeats.bed > sorted-repeats.bed
Then separate your genes by strand with awk
(or your tool of choice):
$ awk '($4=="+")' sorted-genes.bed > sorted-genes-forward.bed
$ awk '($4=="-")' sorted-genes.bed > sorted-genes-reverse.bed
Then run closest-features --dist
on each gene set, against the repeats. The --dist
operand reports the distances between the gene and the nearest upstream and downstream repeats.
In the forward strand case, we use awk
to print out the first and second fields separated by the pipe character (|
). The first field is the gene. The second field is the repeat element upstream from the forward-strand gene element:
$ closest-features --dist sorted-genes-forward.bed sorted-repeats.bed \
| awk 'BEGIN {FS='|'} {print $1"|"$2}' - \
> genes-repeats-forward-with-distances.bed
In the reverse strand case, we use awk
to print out the first and third fields separated by the pipe character. The first field is the gene. The third field is the repeat element upstream from the reverse-strand gene element:
$ closest-features --dist sorted-genes-reverse.bed sorted-repeats.bed \
| awk 'BEGIN {FS='|'} {print $1"|"$3}' - \
> genes-repeats-reverse-with-distances.bed
If you want to put results back together at the end, just use BEDOPS bedops
with the --everything
operand to do a multiset union:
$ bedops --everything genes-repeats-*-with-distances.bed > final-answer.bed
Like this example shows, one nice thing about BEDOPS bedops
is that it can do operations on many input files at once, not just two BED files.
thanks, But what does -s or -S does? Shall i ignore it?
-s
and-S
are about overlaps. If you want something upstream, then it shouldn't be overlapping (unless you want to count an overlap that extends upstream as a hit, which is also possible). For that, though, you would need strand information for everything inbar.bed
(this, of course, wouldn't make any sense for repeats, which have no strand).