bigwig : peak distance from specific genomic region
2
1
Entering edit mode
6.8 years ago

Hi,

I've a bigwig representing chip-seq peaks and I've a bed file containing a bunch of small genomic region (~100b) . How can I intersect the bigwig and the bed file to get for each entry in the bed fil the closest peak from the bigwig (and also the distance from the region in the bed file ). I was thinking to convert the bigwig in bed file and then using closestBed .. what do you think ?

thanks

ChIP-Seq bigwig distance closest bed • 7.9k views
0
Entering edit mode

The bigWig format typically stores continuous signal data rather than intervals like a BED file does. Are you somehow calling peaks from the bigWig file?

1
Entering edit mode
6.8 years ago

You could convert the ChIP-seq peaks to sorted BED and use closest-features to report the nearest upstream and downstream peaks to each of your sorted regions, along with their distances; just add the --dist operand:

closest-features --dist regions.bed peaks.bed > answer.bed

If you want to save a lot of time, you can quickly parallelize the work by adding the --chrom <chromosome> option and using bedextract to get a fast list of chromosomes, using GNU Parallel to farm out the work:

bedextract --list-chr regions.bed \
| parallel "closest-features --dist --chrom ${} regions.bed peaks.bed > p_${}.bed"


Then zip all the results together with a multiset union:

bedops --everything p_*.bed > answer.bed

If formatting is an issue, add the --delim <delimiter> operand to closest-features, to replace the default delimiter with one of your choice, e.g.\t or similar. This can make processing with awk or other downstream scripts a little quicker.

0
Entering edit mode

yes it's pretty the same idea as closestBed. I'll give a shot with both ( bedtools and bedops ) to see which output is easier to process for me.

0
Entering edit mode
6.8 years ago

EDIT: I quickly wrote the following tool https://github.com/lindenb/jvarkit/wiki/Biostar105754 . It should fulfill your needs:

\$  echo -e "1\t1000\t20000\n3\t100\t200\nUn\t10\t11"  |\
java -jar dist/biostar105754.jar -B path/to/All_hg19_RS_noprefix.b

#no data found for  Un  10  11
1   1000    1001    0.0 1   1000    20000
3   100 101 0.0 3   100 200


bigWigAverageOverBed v2 - Compute average score of big wig over each bed, which may have introns.
usage:
bigWigAverageOverBed in.bw in.bed out.tab
The output columns are:
name - name field from bed, which should be unique
size - size of bed (sum of exon sizes
covered - # bases within exons covered by bigWig
sum - sum of values over all bases covered
mean0 - average over bases with non-covered bases counting as zeroes
mean - average over just covered bases
0
Entering edit mode

Thanks Pierre I knew already this tool but it's not exactly what I want. It's important for me to know the distance from the closest peak ( and in a perfect world the distance from the closest peak upstream and downstream of the region of interest ).