Question

using bedtools to find the intersect of bedpe and bed

5

Entering edit mode

9.4 years ago

izzy.yichao.cai ▴ 180

Hi I have a bedpe file(describing loop across genome, distance may be quite long) and a bed file.

The data look like this:

Bedpe(First 6 columns describe the loop which are chromosome1 start1 end1 chromosome2 start2 end2, while the rest of the columns are some attribution that is useful):

chr1    1050000 1060000 chr1    1180000 1190000 0,255,255       241     107.673 11     8.802 143.514 120.144 1.09607073802e-16       9.5834568345e-17        2.5647576134     8e-07       1.6487531336e-16        2       1060000 1180000 7071.06781187

Bed :

chr1 10000 10271 CTCF 1000 . 10000 10271 10,190,254

I want to find the overlap between anchor region in bedpe file and bed file. How can I use bedtools to do this job?

BTW, is there a way to properly sort the bedpe file? I tried to sort using the command "sort -k1,1 -k2,2n infile" that is recommended by the bedtools. Is it suitable for bedpe file? Or should I use "sort -k1,1 -k2,2 -k3,3 -k4,4 -k5,5 -k6,6 infile"?

genome • 8.7k views

ADD COMMENT • link updated 9.2 years ago by PT ▴ 20 • written 9.4 years ago by izzy.yichao.cai ▴ 180

score 4 · Answer 1 · 2016-06-06

4

Entering edit mode

9.4 years ago

QVINTVS_FABIVS_MAXIMVS ★ 2.6k

Using your example:

$ intersectBed -a bedpe -b hg19_cytoband.bed -wao
chr1    1050000 1060000 chr1    118000011900000,255,255 241 107.67311   8.802143.514120.1441.09607073802e-16    9.5834568345e-17    2.5647576134    8e-07   1.6487531336e-16    2   106000011800007071.06781187 chr1    0   2300000 p36.33  gneg    10000

Where chr1 0 2300000 p36.33 gneg 10000 is the feature in the -b file

The last column (10000) is the number of base pairs that overlap the feature in -a.

http://bedtools.readthedocs.io/en/latest/content/tools/intersect.html

To sort a BED file use the sortBed command in bedtools.

$ sortBed -i bedpe

ADD COMMENT • link 9.4 years ago by QVINTVS_FABIVS_MAXIMVS ★ 2.6k

0

Entering edit mode

yep I got similar results like this. I used this command:

bedtools intersect -wa -wb -a bedpe -b bed -sorted

But it seems that the overlap is between first 3 column in -a file and -b file. Is there any way that can also find out the overlap between column 4-6 in -a file and first 3 column in -b file at the same time?

ADD REPLY • link 9.4 years ago by izzy.yichao.cai ▴ 180

1

Entering edit mode

bed file is always concerned with overlapping of first 3 columns in the tab delimited file, chr#, start and end co-ordinates. The rest you see in output are just data entries of corresponding input files that you want to see as output using different handles like -wa . -wb - wao. If you want to work on other columns of a bed file then you simply have to reconstruct new bed file with your desired columns and then use them for your downstream operations.

ADD REPLY • link 9.4 years ago by ivivek_ngs ★ 5.2k

1

Entering edit mode

Break up your BEDPE file using cut. I would also annotate each line in the BEDPE so you can match the two positions in the BEDPE file. If your BEDPE file is chr1 100 200 chr1 500 600 ... I would break it up like chr1 100 200 POS1 and the other file chr1 500 600 POS1

Then run intersectBed on each BED file you generated

ADD REPLY • link 9.4 years ago by QVINTVS_FABIVS_MAXIMVS ★ 2.6k

score 2 · Answer 2 · 2016-08-19

2

Entering edit mode

9.2 years ago

PT ▴ 20

Use pairToBed, bedtools pairtobed -a file.bedpe -b file.bed

ADD COMMENT • link 9.2 years ago by PT ▴ 20