I have a protein motif or site, which I like to identify in an DNA sequence (multiple fasta file). The motif is N-X-S/T (X!=P), which means Asn, followed by any amino acid but not Pro, followed by Ser or Thr. Also X should not be STOP. So I would like to find all the 3 codon combinations for this site in DNA (9 nucleotides).
I was first thinking of getting the motif written in DNA using IUPAC coding, but that seemed not possible. Writing out all possibilities seems like a too hard task, so I thought there might be a tool which can do this? Any suggestions?
Doesn't BLAST(P) already support certain redundant characters?
I'm not sure you'll be able to define all of those exactly, since typically
Xmeans any amino acid (I think), without any restriction. You may not be able to find an alphabet that supports all of what you need.You could maybe blast:
NXSandNXT, and then filter the results with a regex to make sure that the next codon is!= *