Question: How to merge bedGraph records which are next to each other and have the same score?
0
gravatar for James Ashmore
22 months ago by
James Ashmore2.6k
UK/Edinburgh/MRC Centre for Regenerative Medicine
James Ashmore2.6k wrote:

Is anyone aware of software to merge bedGraph records if the score is the same? I have a bedGraph file calculated at single base pair resolution and I would like to decrease the file size by merging records next to each other which have the same score.

bedgraph • 931 views
ADD COMMENTlink modified 22 months ago by geek_y9.7k • written 22 months ago by James Ashmore2.6k
1

awk?

ADD REPLYlink written 22 months ago by Devon Ryan90k
2
gravatar for geek_y
22 months ago by
geek_y9.7k
Barcelona/CRG/London/Imperial
geek_y9.7k wrote:

cat test.bdg

chrY    1   2   10
chrY    2   3   10
chrY    3   4   11
chrY    4   5   12
chrY    5   6   12
chrY    6   7   13
chrY    7   8   14
chrY    8   9   14
chrY    9   10  12

.

cat test.bdg  | groupBy -g 1,4 -c 2,3 -o min,max | awk -v OFS="\t" '{ print $1,$3,$4,$2}'

output:

chrY    1   3   10
chrY    3   4   11
chrY    4   6   12
chrY    6   7   13
chrY    7   9   14
chrY    9   10  12

Bedtools groupBy

ADD COMMENTlink modified 22 months ago • written 22 months ago by geek_y9.7k

Thank you for the reply, however I think this will only work on small bedGraph files. I got the following error when I tried on my base-pair resolution bedGraph file:

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
/home/s1437643/conda/bin/groupBy: line 2: 12111 Aborted                 ${0%/*}/bedtools groupby "$@"
ADD REPLYlink modified 22 months ago • written 22 months ago by James Ashmore2.6k

Did it work for small file and failed for large file ?

ADD REPLYlink written 22 months ago by geek_y9.7k
1

Actually it ended up being an error in the bedtools installation I had, specifically the groupBy command. I've compiled from the last source code and it works perfectly. Thank you!

ADD REPLYlink written 22 months ago by James Ashmore2.6k
1
gravatar for Alex Reynolds
22 months ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

Via BEDOPS, bash and GNU core utilities:

$ SORTED_BEDGRAPH=in.bedGraph
$ while read -r score; do awk -v s=$score '$4==s' ${SORTED_BEDGRAPH} | bedops --merge - | awk -v s=$score '{print $0"\t"s}'; done < <(cut -f4 ${SORTED_BEDGRAPH} | sort | uniq) | sort-bed - > answer.bed

This should perform decently and scale to large inputs.

Use sort-bed if you first need to sort the bedGraph file, so that merging works correctly.

For others working with BED, instead of bedGraph, the score data are usually in the fifth column, which would need adjusting of the two awk and cut -f4 statements.

ADD COMMENTlink modified 22 months ago • written 22 months ago by Alex Reynolds28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1781 users visited in the last hour