Question: Bedtools coverage adding feature scores
0
gravatar for jaime.alvarez.benayas
2.5 years ago by
Spain
jaime.alvarez.benayas10 wrote:

I am trying to calculate the coverage of features in a gff file and a bed file containing the windows.

The bed file (windows_test.bed), note 0 based:

chr1    0       50      1
chr1    50      100000  2

The gff file (test.gff), note 1 based:

chr1    Program     HS4     51      100353571       5       +       0       .

The score for this feature is 5

When I run bedtools coverage I get (note, using version v2.22.1 "Returns the depth and breadth of coverage of features from A on the intervals in B":

bedtools coverage -a test.gff -b windows_test.bed

chr1    50      100000  2       1       99950   99950   1.0000000
chr1    0       50      1       0       0       50      0.0000000

I get the total number of features in the chr1 50 100000 range (1) but not the total scores (5), how can I do to get this?

Thanks!

coverage score bedtools • 1.2k views
ADD COMMENTlink modified 10 months ago by Alex Reynolds26k • written 2.5 years ago by jaime.alvarez.benayas10
0
gravatar for abirmingham
10 months ago by
abirmingham0 wrote:

I think that you will want to use bedtools map rather than (or in addition to) bedtools coverage.

The documentation for map is at http://bedtools.readthedocs.io/en/latest/content/tools/map.html ; it states "bedtools map allows one to map overlapping features in a B file onto features in an A file and apply statistics and/or summary operations on those features." Sum is one of the operations available, and in fact is the default statistic calculated: "By default, map computes the sum of the 5th column (the score field for BED format) for all intervals in B that overlap each interval in A." There's an example of using it for average (which should be easily modifiable to get the sum instead) in another biostars post at How To Extract Scores From Bedgraph File Using Bed Tools .

ADD COMMENTlink modified 10 months ago • written 10 months ago by abirmingham0
0
gravatar for Alex Reynolds
10 months ago by
Alex Reynolds26k
Seattle, WA USA
Alex Reynolds26k wrote:

You can duplicate this and add more operations with bedmap, i.e. start with:

$ bedmap --echo --count --bases --echo-ref-size --delim '\t' windows.bed <(gff2bed test.gff) | awk '{ print $0"\t"($5/$6); }' > answer.bed

The --echo option reports the elements in windows.bed.

The --count option reports the number of features in the map file (the GFF file) that overlap the reference file (the windows.bed file) by one or more bases.

The --bases option reports the number of bases that overlap between window and GFF annotations.

The --echo-ref-size reports the length of the target element (for each target element).

Piping to awk does the same calculations as coverageBed.

To get to the point of your question, you can then start adding other statistical operands available in bedmap, such as --sum to get the sum of scores over the overlapping elements.

You could add this and other operands after --echo-ref-size — or even before it, with adjustments to the awk command downstream. Operand results are all separated by the tab delimiter (--delim '\t'), so whatever order you put the operands in will be the order in which answers come out.

See the following for a full review of score operations: http://bedops.readthedocs.io/en/latest/content/reference/statistics/bedmap.html#score-operations

ADD COMMENTlink written 10 months ago by Alex Reynolds26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1507 users visited in the last hour