I think Pierre's answer might be correct for your test input, but it doesn't account for the start position of the reference element and so might not be correct for all test or real-world inputs.
It may also be necessary to think about the case where both an upstream and downstream SNP are within your distance threshold for a given reference element. I don't think bedtools knows which one to pick?
Here is a BEDOPS-based answer that I think covers the case where the SNP and its distance is thresholded as measured from the start position, and the closest SNP is retrieved in the case where there are two hits.
But, first, this is not a BED file:
$ cat B.bed
chr1 15715
chr1 1412732
chr1 1612000
chr1 16524041
It is probably best to convert it to one, in order to work with it as a zero-based, half-indexed BED file:
$ awk -vFS="\t" -vOFS="\t" '{ print $1, ($2 - 1), $2 }' B.txt | sort-bed - > B.bed
This is not being pedantic. Formats and sort order are important when using BEDOPS or other BED set operation tools. What you are trying to do with these tools will not work if you start with bad inputs.
Once you do this step, you can use BEDOPS closest-features with A.bed and B.bed.
To start, here's what typical output looks like:
$ closest-features --dist A.bed B.bed
chr1 65564 70008|chr1 15714 15715|-49850|chr1 1412731 1412732|1342724
chr1 944693 959240|chr1 15714 15715|-928979|chr1 1412731 1412732|453492
chr1 4245331 4266330|chr1 1611999 1612000|-2633332|chr1 16524040 16524041|12257711
chr1 17146107 17167757|chr1 16524040 16524041|-622067|NA|NA
This tool reports the reference element (elements in A.bed) and each of the nearest SNPs from B.bed upstream and downstream of the bounds of the reference element (not the start position, but the start and end position), along with their respective signed distances from bound edges.
So there's enough information here to answer your question as written, but some additional awk work is necessary to process the start position test correctly, and deal with the case of multiple hits:
$ closest-features --dist A.bed B.bed \
| awk -vFS="|" -vOFS="\t" -vTHRESHOLD=250000 \
'{ \
ref = $1; \
split(ref, refarr, "\t"); \
reflen = refarr[3] - refarr[2]; \
lmap = $2; \
ldist = -$3; \
rmap = $4; \
rdist = (rmap == "NA") ? 0 : $5 - reflen; \
if (lmap == "NA") { \
mindist = rdist; \
minelem = rmap; \
} \
else if (rmap == "NA") { \
mindist = ldist; \
minelem = lmap; \
} \
else { \
mindist = (ldist < rdist) ? ldist : rdist; \
minelem = (ldist < rdist) ? lmap : rmap; \
} \
if (mindist <= THRESHOLD) { \
print ref, minelem, mindist; \
} \
}' > answer.bed
Here's what my answer looks like:
$ cat answer.bed
chr1 65564 70008 chr1 15714 15715 49850
This can be munged into a text file with your format (not BED, though):
$ awk -vFS="\t" -vOFS="\t" '{ print $1, $6, $2, $3 }' answer.bed > answer.reformatted.txt
$ cat answer.reformatted.txt
chr1 15715 65564 70008
Note again that when two SNPs are candidates for a given region, this awk approach picks the closer of the two. You might have instead wanted the SNP further away (though still within the threshold), or you might have instead wanted to pick a SNP at random, in case of statistical analyses downstream. Something to think about before you use these results, perhaps.
Show us what you tried
Thank you for replying Pierre.
I first added a third column to the SNP bed file, so that I would get a range of 1 for each SNP
Next, I tried using bedtools window to find overlapping features between A.bed and B2.bed, but couldn't figure out how to restrict this to +/- from the start of A.bed regions only, and not both start and end (which is the default behaviour).
but in your original post you said
write the full command lines, unfortunately we don't read in minds.
The command lines I wrote out for you are the ones I actually used. My original reference to bedtools intersect was because I assumed the bedtools window command was a component of bedtools intersect. It was simply a mistake; I'm new to this. I don't expect anyone to read my mind, hence my effort to make my question as detailed as possible. Not sure why I'm being met with such a snarky remark...