Discrepancy between UCSC table browser PhyloP score and Bedmap mapped PhyloP score
0
0
Entering edit mode
18 months ago
xinrantian ▴ 10

Hey all,

I was doing conservation analysis for some regions of interest and wanted to map PhyloP score onto the selected regions (in a .bed file). I got different results using different method.

For example, for chr10: 91853138-91853143 If I just use UCSC Table Browser, I will get phyloP as described here:

track type=wiggle_0 name="100 Vert. Cons" description="100 vertebrates Basewise Conservation by PhyloP" 91853138: -0.62474; 91853139: -0.0926929; 91853140: 0.27974; 91853141: -0.145898; 91853142: 0.332945; 91853143:* 0.758583*.

If I use the method described here, I would get

chr10 91853138 91853139 000048-000000|-0.051000 chr10 91853139 91853140 000048-000001|0.332000 chr10 91853140 91853141 000048-000002|-0.107000 chr10 91853141 91853142 000048-000003|0.357000

Notice that the second method only gives me the scores for 4 bases and they are different from the values acquired from the Table Browser.

I also tried the BigWigAverageOverBed utilities and it produced the same results as the second method. Could anyone explain why there's discrepancy? Did I overlook anything?

Thank you so much!

0
Entering edit mode

Which human genome build is the data from? The second method used hg19 - is this the same genome build as the first method?

Which conservation scores did you use? The first method uses

100 Vert. Cons" description="100 vertebrates Basewise Conservation by PhyloP"

The second method uses 46 vertebrate basewise conservation

'phyloP46way'

Why did you get 6 values for the first method and 4 for the second? Wiggle files use 1-based counting, whereas bed files use 0-based counting. For more info on this, look at the way UCSC browser and bed files count nucleotides.

1
Entering edit mode

Thank you for the answer! I used hg19 and 100 vertebrate phyloP for both methods and got different results. I got a response from UCSC genome browser team (quoted below)

The discrepancy you are noticing is due to the different coordinate systems that we use on the genome browser and file types. The following blog post has more information on the subject, http://genome.ucsc.edu/blog/the-ucsc-genome-browser-coordinate-counting-systems/. Both variableStep and fixedStep wiggles use 1-based coordinates, whereas bigWigs use 0-based coordinates.

which is similar to what you said. Thank you!

0
Entering edit mode

What track(s) are you using?

My previous answer uses conservation signal for an older assembly.

If your UCSC Table Browser session is displaying a newer track for the current assembly (hg38) then you will almost certainly get different signal over the same genomic range, than what you would get by copying-pasting my older answer (which uses hg19 data).

It also looks like these are different tracks (100-way vs 46-way comparison).

If you're looking to verify a procedure by using different methods, you'll need to start with the same input.