Question: bigwig : peak distance from specific genomic region
1
gravatar for Nicolas Rosewick
5.3 years ago by
Belgium, Brussels
Nicolas Rosewick8.3k wrote:

Hi,

I've a bigwig representing chip-seq peaks and I've a bed file containing a bunch of small genomic region (~100b) . How can I intersect the bigwig and the bed file to get for each entry in the bed fil the closest peak from the bigwig (and also the distance from the region in the bed file ). I was thinking to convert the bigwig in bed file and then using closestBed .. what do you think ?

thanks

ADD COMMENTlink modified 5.1 years ago by Biostar ♦♦ 20 • written 5.3 years ago by Nicolas Rosewick8.3k

The bigWig format typically stores continuous signal data rather than intervals like a BED file does. Are you somehow calling peaks from the bigWig file?

ADD REPLYlink written 5.1 years ago by Ryan Dale4.8k
1
gravatar for Alex Reynolds
5.3 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

You could convert the ChIP-seq peaks to sorted BED and use closest-features to report the nearest upstream and downstream peaks to each of your sorted regions, along with their distances; just add the --dist operand:

closest-features --dist regions.bed peaks.bed > answer.bed

If you want to save a lot of time, you can quickly parallelize the work by adding the --chrom <chromosome> option and using bedextract to get a fast list of chromosomes, using GNU Parallel to farm out the work:

bedextract --list-chr regions.bed \
 | parallel "closest-features --dist --chrom ${} regions.bed peaks.bed > p_${}.bed"

Then zip all the results together with a multiset union:

bedops --everything p_*.bed > answer.bed

If formatting is an issue, add the --delim <delimiter> operand to closest-features, to replace the default delimiter with one of your choice, e.g.\t or similar. This can make processing with awk or other downstream scripts a little quicker.

ADD COMMENTlink modified 5.3 years ago • written 5.3 years ago by Alex Reynolds29k

yes it's pretty the same idea as closestBed. I'll give a shot with both ( bedtools and bedops ) to see which output is easier to process for me. 

ADD REPLYlink written 5.3 years ago by Nicolas Rosewick8.3k
0
gravatar for Pierre Lindenbaum
5.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum123k wrote:

EDIT: I quickly wrote the following tool https://github.com/lindenb/jvarkit/wiki/Biostar105754 . It should fulfill your needs:

$  echo -e "1\t1000\t20000\n3\t100\t200\nUn\t10\t11"  |\
  java -jar dist/biostar105754.jar -B path/to/All_hg19_RS_noprefix.b


#no data found for  Un  10  11
1   1000    1001    0.0 1   1000    20000
3   100 101 0.0 3   100 200

not the "closest" but http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/ contains a tool named:  bigWigAverageOverBed

 

bigWigAverageOverBed v2 - Compute average score of big wig over each bed, which may have introns.
usage:
   bigWigAverageOverBed in.bw in.bed out.tab
The output columns are:
   name - name field from bed, which should be unique
   size - size of bed (sum of exon sizes
   covered - # bases within exons covered by bigWig
   sum - sum of values over all bases covered
   mean0 - average over bases with non-covered bases counting as zeroes
   mean - average over just covered bases
ADD COMMENTlink modified 5.3 years ago • written 5.3 years ago by Pierre Lindenbaum123k

Thanks Pierre I knew already this tool but it's not exactly what I want. It's important for me to know the distance from the closest peak ( and in a perfect world the distance from the closest peak upstream and downstream of the region of interest ).

ADD REPLYlink written 5.3 years ago by Nicolas Rosewick8.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2862 users visited in the last hour