Question: Average PhastCon score for a bed file
3
gravatar for Aishwarya Kulkarni
5.6 years ago by
United States
Aishwarya Kulkarni80 wrote:

 

I have a bed file and wanted to calculated average PhastCon score over the regions, is there a tool which could do so? I tried using cmotifs but it submits jobs and at times does not produce timely results. 

 

phastcon • 3.5k views
ADD COMMENTlink modified 5.6 years ago by Alex Reynolds30k • written 5.6 years ago by Aishwarya Kulkarni80
9
gravatar for Alex Reynolds
5.6 years ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

You could use bedmap --mean here, to map PhastCon signal onto your sorted regions.

First, grab PhastCon data from UCSC and convert WIG to a compressed form of BED called Starch, using a tool called wig2starch:

$ rootPath="http://hgdownload.cse.ucsc.edu/goldenpath/hg19/phastCons100way/hg19.100way.phastCons"
$ for i in `seq 1 22` X Y; \

    do \
        echo "getting PhastCon data for chromosome chr${i}..."; \
        wigFn="chr${i}.phastCons100way.wigFix"; \
        url="${rootPath}/${wigFn}.gz"; \
        wget -qO- ${url} | gunzip -c - | wig2starch - > ${wigFn}.starch; \
    done

We could have used wig2bed instead of wig2starch, but note that the final set of PhastCons files could take between 10-20 GB compressed, and uncompressed BED files could require about 10x more disk space. It takes a little longer to make Starch files, but it will take less disk space. I'd recommend some kind of compression, if you plan to do repeated map calculations.

Second, map PhastCon signal files to your sorted regions file by chromosome:

$ for i in `seq 1 22` X Y; \
    do \
        echo "mapping signal for chromosome chr${i}..."; \
        phastConFn="chr${i}.phastCons100way.wigFix.starch"; \
        bedmap --echo --mean --chrom chr${i} regions.bed ${phastConFn} > regions_with_avg_phastcons.${i}.bed;  \
    done

Third (optionally), union the per-chromosome results to a single file, using bedops --everything:

$ bedops --everything regions_with_avg_phastcons.*.bed > regions_with_avg_phastcons.bed

Doing the bedmap step with --chrom allows parallelization with grid job schedulers or GNU Parallel, splitting the tasks by chromosome. This could reduce calculation time down to that required for the largest chromosome of regions.

One thing I would note is that the average value calculated is over scores for mapped elements. If you have gaps of scores along your mapped region, those gaps would not contribute to the calculation of the mean signal.

I do not think you can interpolate PhastCon conservation signal between a gap, but it has been a while since I have worked with this particular dataset - its authors could probably confirm or negate this.

It might generally be useful to use bedmap with the --skip-unmapped option to filter out results for unmapped regions.

ADD COMMENTlink modified 12 months ago by RamRS30k • written 5.6 years ago by Alex Reynolds30k

@AlexI

I am getting the following error:

Error: Error: stat() failed on: regions.bed
mapping signal for chromosome chr18...
May use bedmap --help for more help.

For all the chromosomes.

ADD REPLYlink modified 12 months ago by RamRS30k • written 5.0 years ago by Aishwarya Kulkarni80

Errors of the type "stat() failed" usually mean that the file does not exist or is otherwise not accessible. I'd double-check your input files are available and accessible.

ADD REPLYlink written 5.0 years ago by Alex Reynolds30k

Hi Alex the file is present there and I also checked the input format. 

ADD REPLYlink written 5.0 years ago by Aishwarya Kulkarni80

I'm not really sure: there's not much I can troubleshoot as described. Double-check to be sure that the specific inputs exist and are accessible, when the loop fails (chr18?). Make sure you have sufficient disk space and file permissions. Perhaps break down the loop to individual chromosomes, until you find the problem file related to chr18.

ADD REPLYlink written 5.0 years ago by Alex Reynolds30k

Hi Alex to troubleshoot it I just took a single chromosome file, and executed the following command:

bedmap --echo --mean --chrom chr1 regions.bed chr1.phastCons100way.wigFix.starch

May use bedmap --help for more help.

Error: Error: stat() failed on: regions.bed

And so to see if bedmap is working at all on the file I did :

bedmap --echo regions.bed

Let me try some other combinations

Thanks, a very useful code though.

ADD REPLYlink modified 12 months ago by RamRS30k • written 5.0 years ago by Aishwarya Kulkarni80

Is regions.bed sorted per sort-bed? I can't immediately think of any other issues, but that's not to say this isn't a bug. If it helps, if you can post regions.bed somewhere I can download it (Dropbox, Google Drive, etc.), I can offer to try to replicate the error on my end, as the Phastcon data are available from UCSC.

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by Alex Reynolds30k

Sure that would be great, where to can I send a link to download the file?

ADD REPLYlink written 5.0 years ago by Aishwarya Kulkarni80

See my profile page for email address.

ADD REPLYlink written 5.0 years ago by Alex Reynolds30k
1

Or email: reynolda - at - uw.edu

ADD REPLYlink written 5.0 years ago by Alex Reynolds30k
1

I did some spot checks against the regions you sent by email, and I could not reproduce your issue.

For example:

$ bedmap --echo --mean --chrom chr10 regions.bed chr10.phastCons100way.wigFix.starch 
chr10   75255759        75255782        -       ENSG00000107758:E1.1|0.999435
$ bedmap --echo --mean --chrom chrY regions.bed chrY.phastCons100way.wigFix.starch
$

I suggest again double-checking your files, permissions, and so forth. From the error message you reported earlier, it looks like something is wrong specifically at chromosome chr18, so I suggest that you maybe start your investigations there. I'm sorry you're having problems, and I hope you find a solution!

ADD REPLYlink modified 12 months ago by RamRS30k • written 5.0 years ago by Alex Reynolds30k

Thanks for you help Alex.

ADD REPLYlink written 5.0 years ago by Aishwarya Kulkarni80
0
gravatar for mehran.karimzade
5.6 years ago by
Canada
mehran.karimzade180 wrote:

If the files is small and this is a one time thing, you can extract it from ucsc table browser easily.

Go to ucsc, upload the track, go to table browser, specify the phastcons track you want, click on intersection and select your file, there it is.

If you will be doing this frequently, I would suggest downloading the wig file and writing a script for this.

ADD COMMENTlink modified 12 months ago by RamRS30k • written 5.6 years ago by mehran.karimzade180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1245 users visited in the last hour