Question: Bedtools Intersectbed
1
gravatar for Ian
7.3 years ago by
Ian5.3k
University of Manchester, UK
Ian5.3k wrote:

Apologies if this is blatantly obvious!

I would like to compare coordinates in setA with those of setB. The output should have the same number of coordinates as setA and tell me how many nucleotides of each setA coordinate are overlapped by any coordinate in setB.

For example a large coordinate in setA may be overlapped by two setB coordinates, but i want to know how many nucleotides of the setA coordinate are covered by both setB coordinate in total.

I know how to do this on GALAXY as there is the handy 'Coverage' tool in 'Operate on Genomic Intervals'. However, i want to do this on the command line. I have been trying to get BEDTools to do this using 'intersectBed', but i can only seem to get just the overlapping setA coords (using -u), or get the nucleotide over for multiple setB coordinates on separate line (using -wao), or a count of how many setB overlaps setA (using -c).

SetB coordinates are non-overlapping themselves, so i guess i could tally up those SetB coordinates that overlap the same setA coordinate.

Can BEDTools do what i want or there another command line way of doing what i want?

Thank you!

PS I have also sent the to BEDTools discussion, so apologies for any double postings!

bedtools intersect coverage • 3.7k views
ADD COMMENTlink modified 7.2 years ago by brentp22k • written 7.3 years ago by Ian5.3k
7
gravatar for brentp
7.3 years ago by
brentp22k
Salt Lake City, UT
brentp22k wrote:

I believe you can do what you are requesting with coverageBed in bedtools:

$ cat a.bed
chr1    155 200 feature6    0   -
chr1    185 200 feature7    0   -
chr1    800 901 feature8    0   +
$ cat b.bed
chr1    1   100 feature1    0   +
chr1    100 200 feature2    0   +
chr1    150 500 feature3    0   -
chr1    900 950 feature4    0   +
$ coverageBed -a a.bed -b b.bed
chr1    1   100 feature1    0   +   0   0   99  0.0000000
chr1    100 200 feature2    0   +   2   45  100 0.4500000
chr1    150 500 feature3    0   -   2   45  350 0.1285714
chr1    900 950 feature4    0   +   1   1   50  0.0200000

where the final 4 columns are:

   1) The number of features in A that overlapped the B interval.
   2) The number of bases in B that had non-zero coverage.
   3) The length of the entry in B.
   4) The fraction of bases in B that had non-zero coverage.

Note that this is the coverage of the features in -b by the elements in -a.

ADD COMMENTlink written 7.3 years ago by brentp22k

Great! Seems to do the job.

ADD REPLYlink written 7.3 years ago by Ian5.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 729 users visited in the last hour