Question: Average PhastCon score for a bed file
2
gravatar for Aishwarya Kulkarni
4.8 years ago by
United States
Aishwarya Kulkarni60 wrote:

 

I have a bed file and wanted to calculated average PhastCon score over the regions, is there a tool which could do so? I tried using cmotifs but it submits jobs and at times does not produce timely results. 

 

phastcon • 3.1k views
ADD COMMENTlink modified 4.8 years ago by Alex Reynolds29k • written 4.8 years ago by Aishwarya Kulkarni60
8
gravatar for Alex Reynolds
4.8 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

You could use bedmap --mean here, to map PhastCon signal onto your sorted regions.

First, grab PhastCon data from UCSC and convert WIG to a compressed form of BED called Starch, using a tool called wig2starch:

$ rootPath="http://hgdownload.cse.ucsc.edu/goldenpath/hg19/phastCons100way/hg19.100way.phastCons"
$ for i in `seq 1 22` X Y; \

    do \
        echo "getting PhastCon data for chromosome chr${i}..."; \
        wigFn="chr${i}.phastCons100way.wigFix"; \
        url="${rootPath}/${wigFn}.gz"; \
        wget -qO- ${url} | gunzip -c - | wig2starch - > ${wigFn}.starch; \
    done

We could have used wig2bed instead of wig2starch, but note that the final set of PhastCons files could take between 10-20 GB compressed, and uncompressed BED files could require about 10x more disk space. It takes a little longer to make Starch files, but it will take less disk space. I'd recommend some kind of compression, if you plan to do repeated map calculations.

Second, map PhastCon signal files to your sorted regions file by chromosome:

$ for i in `seq 1 22` X Y; \
    do \
        echo "mapping signal for chromosome chr${i}..."; \
        phastConFn="chr${i}.phastCons100way.wigFix.starch"; \
        bedmap --echo --mean --chrom chr${i} regions.bed ${phastConFn} > regions_with_avg_phastcons.${i}.bed;  \
    done

Third (optionally), union the per-chromosome results to a single file, using bedops --everything:

$ bedops --everything regions_with_avg_phastcons.*.bed > regions_with_avg_phastcons.bed

Doing the bedmap step with --chrom allows parallelization with grid job schedulers or GNU Parallel, splitting the tasks by chromosome. This could reduce calculation time down to that required for the largest chromosome of regions.

One thing I would note is that the average value calculated is over scores for mapped elements. If you have gaps of scores along your mapped region, those gaps would not contribute to the calculation of the mean signal.

I do not think you can interpolate PhastCon conservation signal between a gap, but it has been a while since I have worked with this particular dataset - its authors could probably confirm or negate this.

It might generally be useful to use bedmap with the --skip-unmapped option to filter out results for unmapped regions.

ADD COMMENTlink modified 9 weeks ago by RamRS24k • written 4.8 years ago by Alex Reynolds29k

@AlexI

I am getting the following error:

Error: Error: stat() failed on: regions.bed
mapping signal for chromosome chr18...
May use bedmap --help for more help.

For all the chromosomes.

ADD REPLYlink modified 9 weeks ago by RamRS24k • written 4.2 years ago by Aishwarya Kulkarni60

Errors of the type "stat() failed" usually mean that the file does not exist or is otherwise not accessible. I'd double-check your input files are available and accessible.

ADD REPLYlink written 4.2 years ago by Alex Reynolds29k

Hi Alex the file is present there and I also checked the input format. 

ADD REPLYlink written 4.2 years ago by Aishwarya Kulkarni60

I'm not really sure: there's not much I can troubleshoot as described. Double-check to be sure that the specific inputs exist and are accessible, when the loop fails (chr18?). Make sure you have sufficient disk space and file permissions. Perhaps break down the loop to individual chromosomes, until you find the problem file related to chr18.

ADD REPLYlink written 4.2 years ago by Alex Reynolds29k

Hi Alex to troubleshoot it I just took a single chromosome file, and executed the following command:

bedmap --echo --mean --chrom chr1 regions.bed chr1.phastCons100way.wigFix.starch

May use bedmap --help for more help.

Error: Error: stat() failed on: regions.bed

And so to see if bedmap is working at all on the file I did :

bedmap --echo regions.bed

Let me try some other combinations

Thanks, a very useful code though.

ADD REPLYlink modified 9 weeks ago by RamRS24k • written 4.2 years ago by Aishwarya Kulkarni60

Is regions.bed sorted per sort-bed? I can't immediately think of any other issues, but that's not to say this isn't a bug. If it helps, if you can post regions.bed somewhere I can download it (Dropbox, Google Drive, etc.), I can offer to try to replicate the error on my end, as the Phastcon data are available from UCSC.

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by Alex Reynolds29k

Sure that would be great, where to can I send a link to download the file?

ADD REPLYlink written 4.2 years ago by Aishwarya Kulkarni60

See my profile page for email address.

ADD REPLYlink written 4.2 years ago by Alex Reynolds29k
1

Or email: reynolda - at - uw.edu

ADD REPLYlink written 4.2 years ago by Alex Reynolds29k
1

I did some spot checks against the regions you sent by email, and I could not reproduce your issue.

For example:

$ bedmap --echo --mean --chrom chr10 regions.bed chr10.phastCons100way.wigFix.starch 
chr10   75255759        75255782        -       ENSG00000107758:E1.1|0.999435
$ bedmap --echo --mean --chrom chrY regions.bed chrY.phastCons100way.wigFix.starch
$

I suggest again double-checking your files, permissions, and so forth. From the error message you reported earlier, it looks like something is wrong specifically at chromosome chr18, so I suggest that you maybe start your investigations there. I'm sorry you're having problems, and I hope you find a solution!

ADD REPLYlink modified 9 weeks ago by RamRS24k • written 4.2 years ago by Alex Reynolds29k

Thanks for you help Alex.

ADD REPLYlink written 4.2 years ago by Aishwarya Kulkarni60
0
gravatar for mehran.karimzade
4.8 years ago by
Canada
mehran.karimzade160 wrote:

If the files is small and this is a one time thing, you can extract it from ucsc table browser easily.

Go to ucsc, upload the track, go to table browser, specify the phastcons track you want, click on intersection and select your file, there it is.

If you will be doing this frequently, I would suggest downloading the wig file and writing a script for this.

ADD COMMENTlink modified 9 weeks ago by RamRS24k • written 4.8 years ago by mehran.karimzade160
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1166 users visited in the last hour