My lab has sequenced the genome of a bee and my goal is to obtain all the transcription factors from the genome. This is not Chip-seq data so I don't have information on peaks and was wondering how do I go about this process? Sorry if this is a silly question but it seems all tutorials I come across for finding TF motifs use chip-seq peaks but that is not something we have.
I have the assembled genome and it has a separate annotation file of gene IDs. I think I need to just scan the genome for DNA-binding domains upstream from genes but I really am unclear on how to do this. I did come across this paper that says to find DBDs first and then use InterProScan...but their links are broken. Since I am working with a bee I would likely need to use DBDs from other invertebrates that are already found - is this right? Can anyone please provide me with a database link that does work that has DBDs for invertebrates?
Is my thinking correct also? Can I use InterProScan on an assembled genome to obtain a list of enriched TFs?