Let's say I have a large database of cdna sequences in the FASTA format, and I would like to identify a motif in the corresponding amino acid sequence. Let's say I need to find something like:
CxxCxxxxxxxxxxxxHxxx$ where $ will be H or C
I imagine one would start by parsing the fasta files, find the sites where these sub-sequences have to be, then
translate the corresponding coding DNA sequence, then I end up with an amino acid sequence that contains a sequence of this form. If I had a specific amino acid sequence in mind, I could easily find it by using the .find() method in the biopython module. However, I'm not sure how one can try to identify a form like above, in which one would search for a set of motifs.