Hi Biostars people,
I am in the middle of a proteomics experiment. I got my .raw LC-MS/MS files, my software set-up and everything is ready to go. The genome of the organism I am experimenting with is not yet sequenced, but the genome of a closely-related species is.
My idea is: use a gene-prediction database to search MS/MS data against (e.g. with Mascot). I would like to use gene-predictions of the published genome. Furthermore, redundant entries of 90% similarity would be removed and a common contaminants database would be added (https://maxquant.org/contaminants.zip). The protein sequences I could then compare to data of NCBI nr database using the NCBI Basic Local Alignment Search Tool with e.g. the R package Bio3d etc.
My Question (although very broad and hopefully very obvious to answer): How do I generate a gene-prediction database form publically available genomic data on NCBI?
Have mercy with me, I am new to most of this but I have a rough grasp of the jargon and the concepts.
Thanks for your time