So, I have a list of PWM for TF from Transfac Pro. I have mapped the possible matrix name to their corresponding gene via Ensembl gene ID. In total, I have 1082 genes assigned to the PWM. Some PWMs are heterodimer so 1 matrix can be assigned to 2 or more genes. My goal is to create a network among these transcription factor via binding matrix.
My workflow is:
I extract all 1082 genes promoter region. I set this to 500 nucleotide before first nucleotide in the first exon.
I use FIMO from MEME-suite to make the binding prediticion with p-value set to 1e-4.
In the end, I want to get what TF regulate another TF and the end result would be a network of TFs that regulate each other.
I have finished the FIMO calculation and after I mapped the matrix to gene name, I have around 200,000 TF-target pair.
After I check the number of TF for each gene and number of target for each TF, the number doesn't feel right.It is way too many.
For example, 1 TF can have almost 900 target genes. On the other hand, 1 gene can have more than 900 TF. I understand that this calculation only calculate whether the PWM can match some pattern in the DNA sequence in the binding region.
My question is, is there any way to at least filter the TF-Target pair, not only using p-value but also other factor such as TF combination. It is impossible to have 900 TF while the binding region is only 500 nucleotide long.
Thank you for your opinion and suggestion.