Bedtools Intersectbed
1
1
Entering edit mode
10.5 years ago
Ian 5.8k

Apologies if this is blatantly obvious!

I would like to compare coordinates in setA with those of setB. The output should have the same number of coordinates as setA and tell me how many nucleotides of each setA coordinate are overlapped by any coordinate in setB.

For example a large coordinate in setA may be overlapped by two setB coordinates, but i want to know how many nucleotides of the setA coordinate are covered by both setB coordinate in total.

I know how to do this on GALAXY as there is the handy 'Coverage' tool in 'Operate on Genomic Intervals'. However, i want to do this on the command line. I have been trying to get BEDTools to do this using 'intersectBed', but i can only seem to get just the overlapping setA coords (using -u), or get the nucleotide over for multiple setB coordinates on separate line (using -wao), or a count of how many setB overlaps setA (using -c).

SetB coordinates are non-overlapping themselves, so i guess i could tally up those SetB coordinates that overlap the same setA coordinate.

Can BEDTools do what i want or there another command line way of doing what i want?

Thank you!

PS I have also sent the to BEDTools discussion, so apologies for any double postings!

bedtools intersect coverage coverage • 5.7k views
7
Entering edit mode
10.5 years ago
brentp 24k

I believe you can do what you are requesting with coverageBed in bedtools:

$cat a.bed chr1 155 200 feature6 0 - chr1 185 200 feature7 0 - chr1 800 901 feature8 0 +$ cat b.bed
chr1    1   100 feature1    0   +
chr1    100 200 feature2    0   +
chr1    150 500 feature3    0   -
chr1    900 950 feature4    0   +
\$ coverageBed -a a.bed -b b.bed
chr1    1   100 feature1    0   +   0   0   99  0.0000000
chr1    100 200 feature2    0   +   2   45  100 0.4500000
chr1    150 500 feature3    0   -   2   45  350 0.1285714
chr1    900 950 feature4    0   +   1   1   50  0.0200000


where the final 4 columns are:

   1) The number of features in A that overlapped the B interval.
2) The number of bases in B that had non-zero coverage.
3) The length of the entry in B.
4) The fraction of bases in B that had non-zero coverage.


Note that this is the coverage of the features in -b by the elements in -a.

0
Entering edit mode

Great! Seems to do the job.