9.0 years ago by
University of Manchester, UK
Here is a somewhat low-tech and convoluted way of doing what you want using Phastcons scores, as i have no idea what the paper you mentioned did.
1) Get PhastCons scores from UCSC:
There are various phastCons score files, depending on the genomes in the pileup. E.g.
http://hgdownload.cse.ucsc.edu/goldenPath/mm9/phastCons30way/
You can get them by ftp:
ftp hgdownload.cse.ucsc.edu
user name: anonymous
password: <your email address>
cd goldenPath/mm9/phastCons30way/vertebrate
mget -a
2) Convert wig to bigWig:
Each chromosome file of scores (wig files) can be converted into bigWig files. 64-bit binaries come from:
http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/
The fetchChromSizes
script is required to get the chromosome length for the 'wigToBigWig' script:
fetchChromSizes mm9 > mm9.chrom.sizes
Run wigToBigWig on each chromosome file, e.g:
wigToBigWig -clip chr1.data mm9.chrom.sizes chr1.bw
3) Calculate mean phastCons score for the coordinates of interest:
Another script called bigWigSummary
can be obtained from UCSC, using the link above. It takes in a set of genome coordinate and will spit out the max or mean of values contained in the bigWig file, in our case phastCons scores.
Below is a Perl script i wrote (below) that retrieves max and mean scores. Paths to files and scripts will need editing.
This is at least a backup if you don't get anything else!
#!/usr/bin/perl -w
use strict;
#####
# Get highest and mean phastCons score for each binding region in BED file
# 10 Sept 2010. Ian Donaldson.
# 20 Oct 2010 modified to read BED format
#####
# Command line usage
if(@ARGV!=2) {
die ("USAGE: $0 | Regions BED file | Output\n");
}
open(BED, "<$ARGV[0]") or die("Could not open input BED file!\n");
open(OUTPUT, ">$ARGV[1]") or die("Could not open output file!\n");
# Work thru each line of BED file
while(defined(my $line = <BED>)) {
chomp($line);
my ($chr, $start, $end) = '';
# Skip line if line does not start with 'chr'
unless($line =~ /^chr/) { next }
# Remove EOL from each line
chomp($line);
# Split each line by TAB
my @line_bits = split(/\t/, $line);
$chr = $line_bits[0];
$start = $line_bits[1];
$end = $line_bits[2];
# Run bigWigSummary
system("/home/idonaldson/bin/UCSC/bigWigSummary -type=max /home/idonaldson/genomes/mm9_vertebrate_phastCons/bw/$chr.bw $chr $start $end 1 > $$.max_tmp 2>&1");
system("/home/idonaldson/bin/UCSC/bigWigSummary -type=mean /home/idonaldson/genomes/mm9_vertebrate_phastCons/bw/$chr.bw $chr $start $end 1 > $$.mean_tmp 2>&1");
# open temp file
open(MAX_TEMP, "<$$.max_tmp") or die("Cannot open max temp file!\n");
open(MEAN_TEMP, "<$$.mean_tmp") or die("Cannot open mean temp file!\n");
# MAX phastCons
my $max_temp_output = '';
while(<MAX_TEMP>) {
$max_temp_output .= $_;
chomp($max_temp_output);
}
# close max temp file
close(MAX_TEMP);
# remove blank lines s/^$//g
if($max_temp_output =~ /^no/) {
$max_temp_output = '';
}
# MEAN phastCons
my $mean_temp_output = '';
while(<MEAN_TEMP>) {
$mean_temp_output .= $_;
chomp($mean_temp_output);
}
# close max temp file
close(MEAN_TEMP);
# remove blank lines s/^$//g
if($mean_temp_output =~ /^no/) {
$mean_temp_output = '';
}
# save temp file variable to $ARGV[1]
print OUTPUT "$line_bits[0]\t$line_bits[1]\t$line_bits[2]\t$max_temp_output\t$mean_temp_output\n";
}
# remove temporary file to final output
system("rm $$.max_tmp");
system("rm $$.mean_tmp");
# close files
close(BED);
close(OUTPUT);
exit;
•
link
written
9.0 years ago by
Ian ♦ 5.7k