Use UCSC GoldenPath to download wig files, converting them to BED with BEDOPS wig2bed
.
Here's an example of how to get phyloP 46-way scores for hg19
:
$ cd work_directory
$ rsync -avz --progress rsync://hgdownload.cse.ucsc.edu/goldenPath/hg19/phyloP46way/vertebrate .
...
$ for fn in `ls vertebrate/*.gz`; \
do gunzip -c $fn \
| wig2bed - \
> ${fn%%.*}.bed; \
done
You can modify this for other conservation score collections and reference genomes, as needed.
Once you have BED files containing conservation signal, you can use BEDOPS bedmap
to restrict the score set to your exon coordinates.
First, sort your exon coordinates:
$ sort-bed exons.unsorted.bed > exons.bed
Then map exons to conservation scores, e.g. for chrX
:
$ bedmap --echo --echo-map-score --delim '\t' exons.bed chrX.phyloP46way.bed > answer.chrX.bed
To do this map step over all chromosomes:
$ for chr in `seq 1 22` X Y; do bedmap --echo --echo-map-score --delim '\t' exons.bed chr${chr}.phyloP46way.bed > answer.chr${chr}.bed; done
This loop could be parallelized with GNU Parallel or a computation cluster. In any case, doing all this on the command-line gets around browser limitations.