Question: Help For Identifying Conserved Sequences (Phastcons)
5.7 years ago by
United Kingdom
I am trying to identify conserved regions among human, mouse and rat. I am really new to this field and I need your help. I want to try with the human chromosome 22 as reference. I’m starting with the pairwise alignment between human- mouse and human- rat, I’m creating the chains and nets, after that I convert the nets to maf files and finally with MULTIZ I am taking the multiple alignment. I am thinking to use the phastCons with the final maf file but phastCons gives you the probabilities of each nucleotide to be in a conserved sequence. I’m thinking to find the coordinates of the nucleotides that have probability more than a threshold (e.g 0.97) and extract the sequences from the ref genome (human). As final step, I’m thinking to blast these sequences to the mouse and rat whole genomes. Do you think that this procedure that I’m thinking is ok? Is there a way phastCons to give me the sequences and not only the probabilities? Also, and more important, which parameters I must use for phastCons?

Thank you very much in advance.

5.7 years ago by
Cambridge, UK
I believe that those particular alignments have already been performed by other people, and should be available through Ensembl Compara ( Unless there's a particular reason you want to do them yourself?

Thank u for your answer. I know, but I am a 1st year Ph.D and I need to do that because I must find a way to identify the conserved sequences in order to use them as anchors to speed up the alignment process.

