Question: How To Find The Closest Distance From Bed Files Between Genes And Repeats That Are Upstream
0
gravatar for roll
4.6 years ago by
roll240
United Kingdom
roll240 wrote:

How can I use the closestBed from bedtools to find the closest locations between two bed files. The important bit here is that i want them to be upstream and in correct oriantation.

When I use the -s option, it does not report anything (everything is -1).

Then I checked the -D a option. It is returning some results but not sure if it is the right thing.

The other thing to mention is that my genes bed file (lets call is gene.bed) is organized as

chr1 123 234 +
chr1 456 789 -

rather than end position being smaller to indicate the negative strand.

Whereas my repeats.bed file are organized as

chr1 239 456
chr3 456 987

Does bedtools get confused with this?

Which options should i use if i want to find the distance to nearest repeat that is upstream and in the correct orientation?

bedtools bed distance • 3.8k views
ADD COMMENTlink modified 4.6 years ago by Alex Reynolds25k • written 4.6 years ago by roll240
3
gravatar for Devon Ryan
4.6 years ago by
Devon Ryan82k
Freiburg, Germany
Devon Ryan82k wrote:
> cat foo.bed
chr1    123    234    .    .    +
chr1    456    789    .    .    -

> cat bar.bed
chr1    239    456
chr1    800    900
chr3    456    987

> closestBed -D a -id -a foo.bed -b bar.bed 
chr1    456    789    .    .    -    chr1    800    900    -12

It helps if you properly format the BED file. You may also consider the -io option. You can trivially convert your "BED" file into a real BED file with awk '{printf("%s\t%s\t%s\t.\t.\t%s\n", $1, $2, $3, $4)}' incorrectly_formatted.bed > correctly_formatted.bed.

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by Devon Ryan82k

thanks, But what does -s or -S does? Shall i ignore it?

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by roll240

-s and -S are about overlaps. If you want something upstream, then it shouldn't be overlapping (unless you want to count an overlap that extends upstream as a hit, which is also possible). For that, though, you would need strand information for everything in bar.bed (this, of course, wouldn't make any sense for repeats, which have no strand).

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by Devon Ryan82k
3
gravatar for Alex Reynolds
4.6 years ago by
Alex Reynolds25k
Seattle, WA USA
Alex Reynolds25k wrote:

If you're amenable to alternatives, you could use BEDOPS closest-features on strand-separated genes.

For instance, sort the genes and repeats, if not sorted:

$ sort-bed genes.bed > sorted-genes.bed
$ sort-bed repeats.bed > sorted-repeats.bed

Then separate your genes by strand with awk (or your tool of choice):

$ awk '($4=="+")' sorted-genes.bed > sorted-genes-forward.bed
$ awk '($4=="-")' sorted-genes.bed > sorted-genes-reverse.bed

Then run closest-features --dist on each gene set, against the repeats. The --dist operand reports the distances between the gene and the nearest upstream and downstream repeats.

In the forward strand case, we use awk to print out the first and second fields separated by the pipe character (|). The first field is the gene. The second field is the repeat element upstream from the forward-strand gene element:

$ closest-features --dist sorted-genes-forward.bed sorted-repeats.bed \
    | awk 'BEGIN {FS='|'} {print $1"|"$2}' - \
    > genes-repeats-forward-with-distances.bed

In the reverse strand case, we use awk to print out the first and third fields separated by the pipe character. The first field is the gene. The third field is the repeat element upstream from the reverse-strand gene element:

$ closest-features --dist sorted-genes-reverse.bed sorted-repeats.bed \
    | awk 'BEGIN {FS='|'} {print $1"|"$3}' - \
    > genes-repeats-reverse-with-distances.bed

If you want to put results back together at the end, just use BEDOPS bedops with the --everything operand to do a multiset union:

$ bedops --everything genes-repeats-*-with-distances.bed > final-answer.bed

Like this example shows, one nice thing about BEDOPS bedops is that it can do operations on many input files at once, not just two BED files.

ADD COMMENTlink modified 4.5 years ago • written 4.6 years ago by Alex Reynolds25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 693 users visited in the last hour