bigwig : peak distance from specific genomic region
2
1
Entering edit mode
6.8 years ago

Hi,

I've a bigwig representing chip-seq peaks and I've a bed file containing a bunch of small genomic region (~100b) . How can I intersect the bigwig and the bed file to get for each entry in the bed fil the closest peak from the bigwig (and also the distance from the region in the bed file ). I was thinking to convert the bigwig in bed file and then using closestBed .. what do you think ?

thanks

ChIP-Seq bigwig distance closest bed • 7.9k views
ADD COMMENT
0
Entering edit mode

The bigWig format typically stores continuous signal data rather than intervals like a BED file does. Are you somehow calling peaks from the bigWig file?

ADD REPLY
1
Entering edit mode
6.8 years ago

You could convert the ChIP-seq peaks to sorted BED and use closest-features to report the nearest upstream and downstream peaks to each of your sorted regions, along with their distances; just add the --dist operand:

closest-features --dist regions.bed peaks.bed > answer.bed

If you want to save a lot of time, you can quickly parallelize the work by adding the --chrom <chromosome> option and using bedextract to get a fast list of chromosomes, using GNU Parallel to farm out the work:

bedextract --list-chr regions.bed \
 | parallel "closest-features --dist --chrom ${} regions.bed peaks.bed > p_${}.bed"

Then zip all the results together with a multiset union:

bedops --everything p_*.bed > answer.bed

If formatting is an issue, add the --delim <delimiter> operand to closest-features, to replace the default delimiter with one of your choice, e.g.\t or similar. This can make processing with awk or other downstream scripts a little quicker.

ADD COMMENT
0
Entering edit mode

yes it's pretty the same idea as closestBed. I'll give a shot with both ( bedtools and bedops ) to see which output is easier to process for me. 

ADD REPLY
0
Entering edit mode
6.8 years ago

EDIT: I quickly wrote the following tool https://github.com/lindenb/jvarkit/wiki/Biostar105754 . It should fulfill your needs:

$  echo -e "1\t1000\t20000\n3\t100\t200\nUn\t10\t11"  |\
  java -jar dist/biostar105754.jar -B path/to/All_hg19_RS_noprefix.b


#no data found for  Un  10  11
1   1000    1001    0.0 1   1000    20000
3   100 101 0.0 3   100 200

not the "closest" but http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/ contains a tool named:  bigWigAverageOverBed

 

bigWigAverageOverBed v2 - Compute average score of big wig over each bed, which may have introns.
usage:
   bigWigAverageOverBed in.bw in.bed out.tab
The output columns are:
   name - name field from bed, which should be unique
   size - size of bed (sum of exon sizes
   covered - # bases within exons covered by bigWig
   sum - sum of values over all bases covered
   mean0 - average over bases with non-covered bases counting as zeroes
   mean - average over just covered bases
ADD COMMENT
0
Entering edit mode

Thanks Pierre I knew already this tool but it's not exactly what I want. It's important for me to know the distance from the closest peak ( and in a perfect world the distance from the closest peak upstream and downstream of the region of interest ).

ADD REPLY

Login before adding your answer.

Traffic: 1267 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6