I want to identify TF-target gene relationship of a set of 134 genes. These 134 genes belongs to multigene family evolved by gene duplication. I have transcriptome data of five different conditions (each three biological replicate) from the tissues in which the genes are expressed. Overall the transcriptome assembly shows 17203 genes are expressed considering FPKM > 1.
I used the transcriptome dynamics to find a set of genes that show correlation of 0.8 or above with the target 134 genes. In this way I have a set of filtered 2766 genes from the total 17203 genes. By hierarchical clustering based on correlation the set of 2766 genes can be differentiated into 28 clusters. Next, I used TF-target gene relationship (of the species on which I am working) from CIS-BP database. I consider each of the 28 cluster as set of strongly co-expressed genes. Next I scan the promoter region of my target genes for motif search using the known TF-TFBS information from CIS-BP. Whatever motif I found on my target gene, I checked whether the motif is also enriched in the cluster on which the target gene belongs to. In this way I have a set of enriched motifs from the total motifs for each of my target genes. Next, I check whether the Target gene and the TF corresponding to the enriched motif of the target show show correlation of greater than 0.8 or less than -0.8 to predict the regulators (activators/repressors) of my target gene.
My query is whether the approach of motif enrichment on the cluster is okay or else I should go for more stringnt approaches..??