Whole-genome percentile values in bigWig files
Entering edit mode
22 months ago
rachel ▴ 30

Hi all,

I'm looking to compute the k-th percentile value of some bigWig files (ChIP-seq) across the whole genome. For example, I'd like to know the 99.99th percentile value of a bigWig file across the whole genome.

I'm aware of wiggletools histogram (unfortunately does not provide exact percentile values, which I need) and the BEDOPS answer suggested in this question. However, the BEDOPS answer only computes percentile values on a per-chromosome basis, whereas I need whole-genome percentile values. I have tried to modify it to compute percentile values on a whole-genome basis to no avail.

If anyone has any ideas, I would appreciate them!



ChIP-Seq bigwig wiggle statistics • 594 views
Entering edit mode
22 months ago

Unless I'm missing something about the question, here are a couple possibilities:

Perhaps use a pseudochromosome name for all records:

$ bigWigToBedGraph signal.bw signal.bg
$ awk -vFS="\t" -vOFS="\t" '{ $5 = $4; $4 = $1; $1 = "chrPseudo"; print $0; }' signal.bg | sort-bed - > signal.pseudoChr.bed
$ echo -e "chrPseudo\t0\t248956422" > chrPseudo.extents.bed
$ bedmap --echo --kth 0.9999 chrPseudo.extents.bed signal.pseudoChr.bed > answer.bed

I'm assuming that the ChIP-seq input comes from hg38, which is why I create an "extents" file that extends over the longest chromosome in the real hg38 assembly, i.e., chr1. Whatever intervals come out of that awk statement, they will never extend past the longest, real chromosome. If you're not using hg38, you could change 248956422 to match the longest chromosome in your assembly.

That's a pretty convoluted approach.

Another option might be to just strip all the signal from signal.bg into its own text file, and run that file through R.

$ bigWigToBedGraph signal.bw signal.bg
$ cut -f4 signal.bg > signal.txt
$ R
> d <- read.table("signal.txt")
> quantile(d$V1, c(0.9999))
...some answer...

Running ?quantile in R should give more detail about that command.

Even though I'd love for everyone to use BEDOPS, using R or even Python numpy.quantile would probably be faster than rejigging a bigWig into a resorted BED file with a fake chromosome name.

Hope this helps!

Entering edit mode

I just wanted to say that your first option worked beautifully! (I already had everything set up to use bedops so it was easy to adapt to). Thank you so much for responding to my question :)


Login before adding your answer.

Traffic: 1766 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6