Question: Discrepancy between UCSC table browser PhyloP score and Bedmap mapped PhyloP score
0
gravatar for xinrantian
6 months ago by
xinrantian10
xinrantian10 wrote:

Hey all,

I was doing conservation analysis for some regions of interest and wanted to map PhyloP score onto the selected regions (in a .bed file). I got different results using different method.

For example, for chr10: 91853138-91853143 If I just use UCSC Table Browser, I will get phyloP as described here:

track type=wiggle_0 name="100 Vert. Cons" description="100 vertebrates Basewise Conservation by PhyloP" 91853138: -0.62474; 91853139: -0.0926929; 91853140: 0.27974; 91853141: -0.145898; 91853142: 0.332945; 91853143:* 0.758583*.

If I use the method described here, I would get

chr10 91853138 91853139 000048-000000|-0.051000 chr10 91853139 91853140 000048-000001|0.332000 chr10 91853140 91853141 000048-000002|-0.107000 chr10 91853141 91853142 000048-000003|0.357000

Notice that the second method only gives me the scores for 4 bases and they are different from the values acquired from the Table Browser.

I also tried the BigWigAverageOverBed utilities and it produced the same results as the second method. Could anyone explain why there's discrepancy? Did I overlook anything?

Thank you so much!

ADD COMMENTlink modified 5 months ago by Alex Reynolds31k • written 6 months ago by xinrantian10

Which human genome build is the data from? The second method used hg19 - is this the same genome build as the first method?

Which conservation scores did you use? The first method uses

100 Vert. Cons" description="100 vertebrates Basewise Conservation by PhyloP"

The second method uses 46 vertebrate basewise conservation

'phyloP46way'

Why did you get 6 values for the first method and 4 for the second? Wiggle files use 1-based counting, whereas bed files use 0-based counting. For more info on this, look at the way UCSC browser and bed files count nucleotides.

ADD REPLYlink modified 5 months ago • written 5 months ago by rbagnall1.7k
1

Thank you for the answer! I used hg19 and 100 vertebrate phyloP for both methods and got different results. I got a response from UCSC genome browser team (quoted below)

The discrepancy you are noticing is due to the different coordinate systems that we use on the genome browser and file types. The following blog post has more information on the subject, http://genome.ucsc.edu/blog/the-ucsc-genome-browser-coordinate-counting-systems/. Both variableStep and fixedStep wiggles use 1-based coordinates, whereas bigWigs use 0-based coordinates.

which is similar to what you said. Thank you!

ADD REPLYlink written 5 months ago by xinrantian10

What track(s) are you using?

My previous answer uses conservation signal for an older assembly.

If your UCSC Table Browser session is displaying a newer track for the current assembly (hg38) then you will almost certainly get different signal over the same genomic range, than what you would get by copying-pasting my older answer (which uses hg19 data).

It also looks like these are different tracks (100-way vs 46-way comparison).

If you're looking to verify a procedure by using different methods, you'll need to start with the same input.

ADD REPLYlink modified 5 months ago • written 5 months ago by Alex Reynolds31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1233 users visited in the last hour