how to split/ expand a bedGraph file into genomic coordinates of equal bin size ?
2
2
Entering edit mode
6.2 years ago
Zee_S ▴ 60

Hello Everyone, Can you please share with me your insights on how to split a bedGraph file into genomic coordinates of equal bin size? I have average log2(fold enrichment) values calculated for a chIP over input as follows: [columns: 1) chr.name; 2) start; 3)end; 4) log2 value]:

chr1 0 450 0

chr1 450 500 1.4033

chr1 500 650 1.79393

chr1 650 700 0.865939

chr1 700 950 0

chr1 950 1000 0.865939

Now, I want to expand this file in such a way that the values are reported for defined 50bp windows, instead of windows of non-uniform length. As you can see, wherever the log value is same, the windows are combined to make one big window (for example, I want to change the 0-450 into 9x(50)).

I want to do this so that, I can then use two such log2ratio files (corresponding to two chIPs) to make a correlation plot. I am new to NGS data analysis so any and all help is appreciated!

Guidance on how to do this using a python script is highly appreciated.

Thank you.

next-gen ChIP-Seq alignment • 5.4k views
ADD COMMENT
6
Entering edit mode
6.2 years ago

For an arbitrary genome, you can use BEDOPS directly:

$ binSize=50
$ sort-bed signal.bedGraph | awk -vOFS="\t" '{ print $1, $2, $3, ".", $4 }' - > signal.bed
$ bedops --merge signal.bed | bedops --chop ${binSize} - > bins.bed 
$ bedmap --echo --echo-map-score bins.bed signal.bed > answer.bed

Replace the binSize as desired.

For a specific genome where you are doing comparisons between files with different signal windows, you can use BEDOPS with UCSC Kent utilities:

$ binSize=50
$ sort-bed signal.bedGraph | awk -vOFS="\t" '{ print $1, $2, $3, ".", $4 }' - > signal.bed
$ fetchChromSizes hg38 | awk -vOFS="\t" '($1!~/_/){ print $1, "0", $2 }' | sort-bed - > hg38.bed
$ bedops --chop ${binSize} hg38.bed | bedmap --echo --max --prec 3 - signal.bed > answer.bed

We use --max --prec 3 in place of --echo-map-score so that the score reported is the maximum score over the bin, in the generic case where a bin might happen to straddle two signals. You might pick a bin size larger than one of your signal's windows, for instance, or a bin size that does not divide signal windows evenly.

Other signal/score operators are available in bedmap, like --mean, --sum, etc. See bedmap --help or the documentation for more detail.

You can then repeat this for multiple signal.bedGraph files, and get score values over the same bins, regardless of what windows are in any particular signal.bedGraph.

ADD COMMENT
2
Entering edit mode
6.2 years ago
venu 7.1k

I would follow this approach (if not python).

  1. Generate 50bp windows using bedtools makewindows function
  2. Convert bedGraph to bigwig using bedGraphToBigwig
  3. Use bed file from step-1 and bigwig from step-2 with multiBigwigSummary from deeptools or bigWigAverageOverBed
ADD COMMENT
0
Entering edit mode

Hi Venu,

Thanks a lot for your reply. Just to clarify: in step 1) do you mean to make 50bp windows from the bedGraph file that I already have? if yes, then do I get rid of the fourth column?

Thank you a

ADD REPLY
0
Entering edit mode

Not from bedGraph file. Check bedtools makewindows function. It would be from chr sizes file.

ADD REPLY
0
Entering edit mode

Thank you very much! It worked!

ADD REPLY

Login before adding your answer.

Traffic: 1709 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6