Question

Counting Features In A Bed File

1

Entering edit mode

11.5 years ago

k.nirmalraman ★ 1.1k

I have a file in the following BED format

Chr1 1022071 1022105  +      
Chr1 1022071 1022105  +
Chr1 1022072 1022106  -  
Chr1 1022072 1022106  - 
Chr1 1022072 1022106  -
Chr1 1022072 1022106  -

I am trying get the counts of each feature represented in this file.

mergeBed -i R5_chr.bed -n -s -d 0 > Output/R5_chr_counts.bed

I am interested in the counts of the features and I do not want to merge features by any number of base pairs. Then the output should be as follows

Chr1 1022071 1022105 2 +
Chr1 1022072 1022106 4 +

Any suggestions on how to achieve this using bedtools or in bash or awk? Thanks in advance!

bedtools bash awk • 6.2k views

ADD COMMENT • link updated 11.5 years ago by Dave Richardson ▴ 370 • written 11.5 years ago by k.nirmalraman ★ 1.1k

score 5 · Answer 1 · 2012-11-22

5

Entering edit mode

11.5 years ago

Dave Richardson ▴ 370

Based on the example you've given this should work:

sort R5_chr.bed | uniq -c | awk '{ print $2,$3,$4,$1,$5}' > Output/R5_chr_counts.bed

Giving this output:

Chr1 1022071 1022105 2 +
Chr1 1022072 1022106 4 -

If the BED file is already sorted you can omit the initial sort command:

uniq -c R5_chr.bed | awk '{ print $2,$3,$4,$1,$5}' > Output/R5_chr_counts.bed

ADD COMMENT • link 11.5 years ago by Dave Richardson ▴ 370

0

Entering edit mode

Thank you very much!! This worked perfectly to my need :)

ADD REPLY • link 11.5 years ago by k.nirmalraman ★ 1.1k

score 2 · Answer 2 · 2012-11-22

2

Entering edit mode

11.5 years ago

zx8754 11k

sort <file> | uniq --count

Find duplicate lines in a file and count how many time each line was duplicated: http://stackoverflow.com/questions/6712437/find-duplicate-lines-in-a-file-and-count-how-many-time-each-line-was-duplicated

ADD COMMENT • link 11.5 years ago by zx8754 11k