Entering edit mode
9 months ago
jet
•
0
Given a FASTA file of DNA-binding protein sequences only, I would like to find out if which motifs/subsequences are in a certain amount of sequences (at least 30% or more of them for example). These motifs do not need to be exact. Then after finding out these motifs, I would like to build a HMM model for these motifs.
I have already used HMMER + Pfam to find out domains of interest, but I suspect that some patterns may not be covered by Pfam. Is there any tool out there that can help me achieve this?