Question: how to split/ expand a bedGraph file into genomic coordinates of equal bin size ?
2
gravatar for Zee_S
3.1 years ago by
Zee_S60
Zee_S60 wrote:

Hello Everyone, Can you please share with me your insights on how to split a bedGraph file into genomic coordinates of equal bin size? I have average log2(fold enrichment) values calculated for a chIP over input as follows: [columns: 1) chr.name; 2) start; 3)end; 4) log2 value]:

chr1 0 450 0

chr1 450 500 1.4033

chr1 500 650 1.79393

chr1 650 700 0.865939

chr1 700 950 0

chr1 950 1000 0.865939

Now, I want to expand this file in such a way that the values are reported for defined 50bp windows, instead of windows of non-uniform length. As you can see, wherever the log value is same, the windows are combined to make one big window (for example, I want to change the 0-450 into 9x(50)).

I want to do this so that, I can then use two such log2ratio files (corresponding to two chIPs) to make a correlation plot. I am new to NGS data analysis so any and all help is appreciated!

Guidance on how to do this using a python script is highly appreciated.

Thank you.

chip-seq alignment next-gen • 2.8k views
ADD COMMENTlink modified 3.1 years ago by Anuragthakur6530 • written 3.1 years ago by Zee_S60
5
gravatar for Alex Reynolds
3.1 years ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

For an arbitrary genome, you can use BEDOPS directly:

$ binSize=50
$ sort-bed signal.bedGraph | awk -vOFS="\t" '{ print $1, $2, $3, ".", $4 }' - > signal.bed
$ bedops --merge signal.bed | bedops --chop ${binSize} - > bins.bed 
$ bedmap --echo --echo-map-score bins.bed signal.bed > answer.bed

Replace the binSize as desired.

For a specific genome where you are doing comparisons between files with different signal windows, you can use BEDOPS with UCSC Kent utilities:

$ binSize=50
$ sort-bed signal.bedGraph | awk -vOFS="\t" '{ print $1, $2, $3, ".", $4 }' - > signal.bed
$ fetchChromSizes hg38 | awk -vOFS="\t" '($1!~/_/){ print $1, "0", $2 }' | sort-bed - > hg38.bed
$ bedops --chop ${binSize} hg38.bed | bedmap --echo --max --prec 3 - signal.bed > answer.bed

We use --max --prec 3 in place of --echo-map-score so that the score reported is the maximum score over the bin, in the generic case where a bin might happen to straddle two signals. You might pick a bin size larger than one of your signal's windows, for instance, or a bin size that does not divide signal windows evenly.

Other signal/score operators are available in bedmap, like --mean, --sum, etc. See bedmap --help or the documentation for more detail.

You can then repeat this for multiple signal.bedGraph files, and get score values over the same bins, regardless of what windows are in any particular signal.bedGraph.

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Alex Reynolds31k
2
gravatar for venu
3.1 years ago by
venu6.8k
Germany
venu6.8k wrote:

I would follow this approach (if not python).

  1. Generate 50bp windows using bedtools makewindows function
  2. Convert bedGraph to bigwig using bedGraphToBigwig
  3. Use bed file from step-1 and bigwig from step-2 with multiBigwigSummary from deeptools or bigWigAverageOverBed
ADD COMMENTlink written 3.1 years ago by venu6.8k

Hi Venu,

Thanks a lot for your reply. Just to clarify: in step 1) do you mean to make 50bp windows from the bedGraph file that I already have? if yes, then do I get rid of the fourth column?

Thank you a

ADD REPLYlink written 3.1 years ago by Zee_S60

Not from bedGraph file. Check bedtools makewindows function. It would be from chr sizes file.

ADD REPLYlink written 3.1 years ago by venu6.8k

Thank you very much! It worked!

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Zee_S60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 958 users visited in the last hour
_