6.9 years ago by
This is an unusual and interesting question. I think it might help to generalize it:
- Given a list of species, how can I find those that have DNA sequence?
- or how can I find a specific DNA sequence for a list of species?
To start with the second question: which sequence(s) do you want to use to build the tree? It's possible that a specialized database exists; for example, the SILVA rRNA database might be useful if you wanted to use 18S rDNA sequence.
For the first question, my starting point would be the NCBI. There are several ways into the problem. You might start with the taxonomy resource, by searching for species. Here, for example, is a page for zebrafish: in the box on the right are links to nucleotide sequences for that organism. Or you might want to search the Gene or Nucleotide databases using organism + gene name as a query, for example:
"Homo sapiens"[ORGN] AND GUCA2B[GENE]
Once you've identified a good search strategy you will want to automate search and retrieval, for which you'll need NCBI EUtils and some programming ability (in e.g. Perl, Ruby or Python).