Question

How To Identify Conserved Elements Via The Ucsc Genome Browser

2

Entering edit mode

10.5 years ago

vasilislenis ▴ 150

Hello everybody,

My name is Vasilis and I am a PhD student. I am a computer scientist and my knowledge in biology is limited, unfortunately. I want to study the conserved elements among mammals and more specific among ruminants.

To do that first I must find a way to identify them. In order to do that I used the PhastCons files from UCSC Genome Browser for 46 species (human genome as reference. I'm using these files because these files are being tested and are trustworthy). I found from these files the consecutive nucleotides with probability of conservation more than 99% and for length I am giving different values (more than: 100, 150, 200, 250, 300, 350 bp). After that, I’m extracting these subsequences of the reference genome (human) and I’m blast them back to each genome.

I am finding around 49.000 conserved element with length more than 100bp all over the human genome, but when I’m blasting them to another genome for example on the mouse genome, the number of conserved are only 445. (I’m collecting only the hits with 100% similarity and the same length).

It suppose that all of these conserved elements must be found and on the other species.

Could you tell me if this methodology that I am following is the appropriate? I’m trying to find if there is a problem with my methodology or I’m miss-calculating something.

Thank you very much in advance, Vasilis.

conservation • 4.5k views

ADD COMMENT • link updated 10.5 years ago by Istvan Albert 100k • written 10.5 years ago by vasilislenis ▴ 150

0

Entering edit mode

I think requiring 100% similarity along the entire length is likely to be too stringent.

ADD REPLY • link 10.5 years ago by Sean Davis 26k

0

Entering edit mode

But when I'm finding with phastCons regions more than 300bps for example, with 99% it suppose that these regions must be identical in the other species, right? So blast must give you identical alignment in these regions. I believe I'm loosing something...

ADD REPLY • link 10.5 years ago by vasilislenis ▴ 150

0

Entering edit mode

For sequences to be conserved, they do not need to be identical. You may want to read the phastcons paper to see the distinction.

ADD REPLY • link 10.5 years ago by Sean Davis 26k

0

Entering edit mode

I am sorry for the inconvenience, but Im trying to find the identical conserved elements (UCE). With phastCons Im identifying the highly conserved elements (HCE) but in that there are not the UCE. Can I find the UCE with phastCons? Thank you for one more time.

ADD REPLY • link 10.5 years ago by vasilislenis ▴ 150