I have around 10000 fish EST sequences in a fasta file and want to have an Entrez gene ID for as many as possible of these sequences. The reason I want Entrez gene IDs is to facilitate gene ontology searches and analyses.
The traditional approach I used to do for these is to blast on the swissprot and nr databases, retrieve the identifiers and convert them into Entrez gene ID. However, using different tools (David, UniProt conversion...), I typically retrieve only a small percentage of these.
How could I go efficiently from the EST sequences to Entrez gene IDs?
My goal is to be able to automatize the process and get the maximum number of gene IDs possible for my gene ontology analyses. If, alternatively, you know of an approach to get another, just as useful, gene identifier that would integrate well with gene ontology tools, I am also interested.