I'm looking for a way to search for the most conserved gene in two or more species. For example, if I input 'human' and 'pig', the output would be the top n most conserved gene between humans and pigs (either by nucleic acid or amino acid similarity).
Is there any tool that can do this? At the moment I'm reading UCSC's genome browser documentation to see if this could be done using it, but I'm not sure it could.
The biggest problem is that you have to have a clear definition of 'conserved genes' before proceeding.
First, you can align both proteins and dna sequences: so, you have two definition of conservation, whether two genes have similar protein sequence or similar dna sequence. Then, you can also define 'conservation' as having an high rate of synonymous changes compared to non.synonymous ones, and you can include splicing, gene regulation, expression, etc....
One of the approaches you can use is to use a statistics called omega which is the ratio between dN and dS (rates of synonymous/non synonymous changes) between the sequences of two proteins. You can go to ensembl/biomart, get the omega value for all genes with their orthologues, and then just sort and get the most conserved values.
I guess for starters, I'll just be looking for the most similar protein sequences. So for example, I would want to know among all the orthologs between humans and zebrafish, which one has the most similar protein sequences.
I've never used biomart before, but thanks for giving me another tool to play with :).
note that you can also get % of identity from biomart. Go to biomart, select "Ensembl Genes->Homo sapiens Genes" as dataset, and then in 'Attributes' click on "homologues" and on a species. You can select the values from there. Note that you can't get the omega value directly, you have to calculate dN/dS.
I guess for starters, I'll just be looking for the most similar protein sequences. So for example, I would want to know among all the orthologs between humans and zebrafish, which one has the most similar protein sequences.
I've never used biomart before, but thanks for giving me another tool to play with :).
note that you can also get % of identity from biomart. Go to biomart, select "Ensembl Genes->Homo sapiens Genes" as dataset, and then in 'Attributes' click on "homologues" and on a species. You can select the values from there. Note that you can't get the omega value directly, you have to calculate dN/dS.
Thanks! I tried using filters -> 'orthologous x genes' only and it works, too.