Question: How to get conservation information for bed file using 100wayPhastCons
1
gravatar for chrys
3.7 years ago by
chrys40
Germany
chrys40 wrote:

Hi,

I have several bed files for which I would like to get conservation information. I have downloaded the 100wayPhastCons.bw file from UCSC and my plan is to now get the conservation scores for my bed intervals from this file.

Is there a way to achieve this ?

Thanks,

sequence conservation bed genome • 1.7k views
ADD COMMENTlink modified 3.6 years ago by Alex Reynolds31k • written 3.7 years ago by chrys40
1

Not sure if one of these help (you may have to modify solutions to fit your needs): Working With Phastcons Files
Obtaining phastCons conservation score for every gene in the Human genome
Batch conservation analysis

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by genomax91k

Thanks, those are the ones I am going by right now. I was just hoping that somebody knew a tutorial or a site where a pipeline is described to get this kind of conservation information.

Quick Edit: I also was hoping to get conservation score per base. Most ways described are only returning a summary.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by chrys40
4
gravatar for Alex Reynolds
3.6 years ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

Here's a pipeline you can use:

  1. Convert the conservation bigWig (.bw) to Wiggle (.wig) with bigWigToWig, using /dev/fd/1 as a filename for standard output. This should allow you to pipe the result to BEDOPS wig2bed and get a sorted five-column BED file:

    $ bigWigToWig conservation.bw /dev/fd/1 | wig2bed - > conservation.bed5
    
  2. Split your intervals up into per-base elements with bedops --chop 1.

    $ bedops --chop 1 intervals.bed > perbase.bed3
    
  3. Run bedmap to map your per-base elements to the BED-formatted conservation signal. Normally, unmapped elements report no value. You can use the following command to replace unmapped bases with a score of 0:

    $ bedmap --count --echo --echo-map-score --delim '\t' perbase.bed3 conservation.bed5 | awk '{ if ($1=="0") { print $0"0.0"; } else { print $0; } }' | cut -f2- > answer.bed
    

    This will only make sense if a missing conservation score should be zero. You could alternatively replace 0 with NaN or a similar IEEE-ready placeholder for a non-number.

Finally, if you want to put everything into a one-liner, use file substitutions:

$ bedmap --count --echo --echo-map-score --delim '\t' <(bedops --chop 1 intervals.bed) <(bigWigToWig conservation.bw /dev/fd/1 | wig2bed - ) | awk '{ if ($1=="0") { print $0"0.0"; } else { print $0; } }' | cut -f2- > answer.bed

If you're going to do repeated queries with different sets of intervals, it probably makes sense to do the bigWig-to-BED5 conversion step only once, and drop that in as conservation.bed5 where you need it in mapping.

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Alex Reynolds31k
2
gravatar for genecats.ucsc
3.6 years ago by
genecats.ucsc580
genecats.ucsc580 wrote:

Hi chrys,

You will probably be interested in the following page, which details how to extract wiggle data: http://genomewiki.ucsc.edu/index.php/Using_hgWiggle_without_a_database

If you have any follow-up questions, it would be helpful if you could post them to our Google Groups forum: https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome, that way our whole team can see the question and help with an answer.

Thanks,

ChrisL from the UCSC Genome Browser

ADD COMMENTlink written 3.6 years ago by genecats.ucsc580
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 829 users visited in the last hour