My name is Vasilis and I am a PhD student. I am a computer scientist and my knowledge in biology is limited, unfortunately. I want to study the conserved elements among mammals and more specific among ruminants.
To do that first I must find a way to identify them. In order to do that I used the PhastCons files from UCSC Genome Browser for 46 species (human genome as reference. I'm using these files because these files are being tested and are trustworthy). I found from these files the consecutive nucleotides with probability of conservation more than 99% and for length I am giving different values (more than: 100, 150, 200, 250, 300, 350 bp). After that, I’m extracting these subsequences of the reference genome (human) and I’m blast them back to each genome.
I am finding around 49.000 conserved element with length more than 100bp all over the human genome, but when I’m blasting them to another genome for example on the mouse genome, the number of conserved are only 445. (I’m collecting only the hits with 100% similarity and the same length).
It suppose that all of these conserved elements must be found and on the other species.
Could you tell me if this methodology that I am following is the appropriate? I’m trying to find if there is a problem with my methodology or I’m miss-calculating something.
Thank you very much in advance, Vasilis.