Question

Correlation Between Genome Conservation Scores (Phastcons Vs. Phylop)?

23

Entering edit mode

13.9 years ago

Adrian ▴ 700

I've been looking at using conservation scores, obtained from the UCSC Genome database, as a means to prioritize some SNPs we're looking at that are in non-coding regions.

The PhastCons score is a probability that each nucleotide belongs to a conserved element, whereas abs(phyloP) is the -log(p-value) under a null hypothesis of neutral evolution, and a negative sign indicates faster-than expected evolution, while positive values imply conservation.

In eyeballing the data a bit, I was a little surprised that there appears to be only weak (0.397) correlation between the phyloP and and PhastCons values at each site. (I was looking at PhastCons vs. 1- exp(-phyloP) for only sites with positive phyloP). While I realize that they're different statistics and measuring slightly different things, I would have still expected them to be quite highly correlated.

Any experiences or thoughts about using these scores?

conservation comparative • 35k views

ADD COMMENT • link updated 13.9 years ago by Ning-Yi Shao ▴ 390 • written 13.9 years ago by Adrian ▴ 700

3

Entering edit mode

I am not sure about phyloP, but I once used PhastCon score. PhastCon socre is the score from 0 to 1 to show the conservation level. But PhastCon score is not a linear system, most part of the genome are 0 score or even not have score at all, only some parts have quite high score -- they usually are annotated genes (here I am talk about the score based on the placental mammal). I don't think you will simplely find high ccorrelation of the two score systems.

ADD REPLY • link 13.9 years ago by Ning-Yi Shao ▴ 390

1

Entering edit mode

But the highest conserved regions showed by the two score systems should be highly correlated. And, perhaps you may choose the right PhastCon score, try different PhastCon score tracks that based on different spieces span -- but I am not optimistic you will find high correlation.

ADD REPLY • link 13.9 years ago by Ning-Yi Shao ▴ 390

0

Entering edit mode

Could you please add these comments as an answer? Comments are more for clarification, your comments are very informative and actually provide what the original poster asked for. After that we can remove these comments altogether.

ADD REPLY • link 13.9 years ago by Istvan Albert 100k

0

Entering edit mode

Istvan, I followed your suggestion to add my comments as an ansewr.

ADD REPLY • link 13.9 years ago by Ning-Yi Shao ▴ 390

score 7 · Answer 1 · 2010-05-28

I am not sure about phyloP, but I once used PhastCon score. PhastCon socre is the score from 0 to 1 to show the conservation level. But PhastCon score is not a linear system, most part of the genome are 0 score or even not have score at all, only some parts have quite high score -- they usually are annotated genes (here I am talking about the score based on the placental mammal). Here are the figures I once drew based on 17 way Phastcon score at about 2007 (figure 1, figure 2, figure 2 is the y axis of log transformation, left is score 0, right is score 1, and many gaps without score perhaps because of the gaps of the genomes' alignments).

I don't think you will simplely find high ccorrelation of the two score systems. But the highest conserved regions showed by the two score systems should be highly correlated. And, perhaps you may choose the right PhastCon score, try different PhastCon score tracks that based on different spieces span -- but I am not optimistic you will find high correlation.

score 5 · Answer 2 · 2010-05-28

I believe that per-base PhastCon scores are the result of a windowed calculation — and the windowing may be tuned depending on the alignment and the genomes involved — while per-base phyloP scores are obtained from a per-base calculation.

Therefore, assuming the same genome alignment is used for score generation, it may be difficult to calculate an informative correlation score without first transforming the phyloP scores in a similar fashion, so as to make a fair comparison.