Question: How To Identify Conserved Elements Via The Ucsc Genome Browser
6.3 years ago by
United Kingdom
vasilislenis110 wrote:

Hello everybody,

My name is Vasilis and I am a PhD student. I am a computer scientist and my knowledge in biology is limited, unfortunately. I want to study the conserved elements among mammals and more specific among ruminants.

To do that first I must find a way to identify them. In order to do that I used the PhastCons files from UCSC Genome Browser for 46 species (human genome as reference. I'm using these files because these files are being tested and are trustworthy). I found from these files the consecutive nucleotides with probability of conservation more than 99% and for length I am giving different values (more than: 100, 150, 200, 250, 300, 350 bp). After that, I’m extracting these subsequences of the reference genome (human) and I’m blast them back to each genome.

I am finding around 49.000 conserved element with length more than 100bp all over the human genome, but when I’m blasting them to another genome for example on the mouse genome, the number of conserved are only 445. (I’m collecting only the hits with 100% similarity and the same length).

It suppose that all of these conserved elements must be found and on the other species.

Could you tell me if this methodology that I am following is the appropriate? I’m trying to find if there is a problem with my methodology or I’m miss-calculating something.

Thank you very much in advance, Vasilis.

conservation • 3.1k views
ADD COMMENTlink modified 6.3 years ago by Istvan Albert ♦♦ 82k • written 6.3 years ago by vasilislenis110

I think requiring 100% similarity along the entire length is likely to be too stringent.

ADD REPLYlink written 6.3 years ago by Sean Davis25k

But when I'm finding with phastCons regions more than 300bps for example, with 99% it suppose that these regions must be identical in the other species, right? So blast must give you identical alignment in these regions. I believe I'm loosing something...

ADD REPLYlink written 6.3 years ago by vasilislenis110

For sequences to be conserved, they do not need to be identical. You may want to read the phastcons paper to see the distinction.

ADD REPLYlink written 6.3 years ago by Sean Davis25k

I am sorry for the inconvenience, but Im trying to find the identical conserved elements (UCE). With phastCons Im identifying the highly conserved elements (HCE) but in that there are not the UCE. Can I find the UCE with phastCons? Thank you for one more time.

ADD REPLYlink written 6.3 years ago by vasilislenis110
