I need to extract 10bp flanking sequences around motifs from exonic regions while avoiding motif overlaps. Given:
exons.bed:
chr1 100 200 . . +
chr1 200 250 . . -
motifs.bed (RBP binding sites):
chr1 110 120 . . +
chr1 118 128 . . +
chr1 130 140 . . +
chr1 140 150 . . +
chr1 210 220 . . -
exons_minus_motifs.bed (exonic regions without motifs):
chr1 100 110 . . +
chr1 128 130 . . +
chr1 150 200 . . +
chr1 200 210 . . -
chr1 220 250 . . -
Expected Output : flanks.bed
chr1 100 110 motif1_up . +
chr1 128 130 motif1_down . +
chr1 150 158 motif1_down . +
chr1 100 110 motif2_up . +
chr1 128 130 motif2_down . +
chr1 150 158 motif2_down . +
chr1 102 110 motif3_up . +
chr1 128 130 motif3_up . +
chr1 150 160 motif3_down . +
chr1 102 110 motif4_up . +
chr1 128 130 motif4_up . +
chr1 150 160 motif4_down . +
chr1 200 210 motif5_down . -
chr1 220 230 motif5_up . -
Is there a simpler way to code this, or perhaps a specific bedtools command that would make the process more straightforward? I feel like I’m going in circles and not getting the desired output. Any help or guidance would be greatly appreciated. The 10 bases do not have to be continuous (it's only the next available 10 bases after the motif), it just has to fall in the exon_minus_motif regions. Also the flanks can be overlapping.
The flanks should:
- Not overlap the motif.
Not extend outside the exon.
Be the next available 10 bp in exons_minus_motifs.bed in the correct direction.
Can overlap with flanks from other motifs.
Strand-specific (+ or -).
Label the flanks as motifN_up and motifN_down.
Please edit and add the solution you found so it helps future visitors. Just saying "fixed" provides no useful information.