Question: How to generate gene-predictions from a published genome?
2.3 years ago by
Switzerland/Basel/Department of Environmental Sciences
Hi Biostars people,

Some information:

I am in the middle of a proteomics experiment. I got my .raw LC-MS/MS files, my software set-up and everything is ready to go. The genome of the organism I am experimenting with is not yet sequenced, but the genome of a closely-related species is.

My idea is: use a gene-prediction database to search MS/MS data against (e.g. with Mascot). I would like to use gene-predictions of the published genome. Furthermore, redundant entries of 90% similarity would be removed and a common contaminants database would be added ( The protein sequences I could then compare to data of NCBI nr database using the NCBI Basic Local Alignment Search Tool with e.g. the R package Bio3d etc.

My Question (although very broad and hopefully very obvious to answer): How do I generate a gene-prediction database form publically available genomic data on NCBI?

Have mercy with me, I am new to most of this but I have a rough grasp of the jargon and the concepts.

Thanks for your time

First step is to predict the genes:

After that you would probably post a new question about the output. I used AUGUSTUS one time a while ago, it was easy to use. There is also an online version

Which genome? NCBI genomes in general already have been annotated with protein predictions.

This is the genome I am talking about:

There is an old annotation here:

The NCBI genome, more recent and with more data, has not been annotated, though.

