Hi all,
I have a EST DNA sequence database and I wish it to be searched for protein identification using tandem mass spectrometry data. However I have reasons to believe that the protein is not coded by the reverse strand. So for a six frame translation I can safely ignore the 3 reading frames from the reverse strand.
My question is, then how to use the rest of the three reading frames from the positive strand as a database? Common softwares generally translate all six frames, but in this case I need only three. Shall I use the translated longest open reading frame(ORF) from 3 reading frames? Or shall I keep ALL ORFs generated from the 3 frames? Or shall I keep the translated Reading frame that contains the longest ORF. For the last case there will be a lot of translated STOP CODONS (marked as *) inside the sequence however.