Apparently as everybody here, I am a beginner in bioinformatics but I have been entitled with the task of perform an Enrichment analysis. But I am facing troubles trying to obtain the same data among the different databases.
The main thing: We have many gene lists, and we want to know which ones are regulated by the transcription factor MEF2A or MEF2C. Also, we would like to know the enrichment percentage compared to the genome (Ex: If MEF2 regulates the 25% of the entire genome, and in our list it regulates the 50% of the genes, the enrichment is 100%).
So my approach has been: Find which genes in our list are regulated by MEF2A and MEF2C and find which genes are regulated by them in the entire genome.
I have been doing a bit of testing and searching databases and software, and my problem comes when the data among databases are considerably different. In the UCSC browser I can see that LDLR actually has MEF2 binding sites (but I cannot do the manual query for every gene in the list using the USCS browser) but is too hard to find in the other databases that LDLR has the TFBS.
List of different databases for prediction of TFBS with a little explanation of each database: https://abc.med.cornell.edu/education/introtobio/t-promoter.html
Done a search at researchgate obtaining similar results: https://www.researchgate.net/post/How_to_find_the_binding_sites_of_transcription_factors10
Checked this one: http://www.sabiosciences.com/chipqpcrsearch.php?gene=&species_id=0&factor=MEF-2A&ninfo=n&ngene=y&nfactor=n
Or Biostars: Determine Transcription Factors For Genes
And many others sources(enrichr, Harmonizome...), and software as iRegulon (in Cytoscape).
What can I do?
ENCODE resource seems a good starting point (http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeRegTfbsClustered/; try wgEncodeRegTfbsClusteredV3.bed.gz ). This file contains TFBS in BED format. You can grep MEF2 sites and get nearest genes with an short script. Several MEF2 sites next to LDLR TSS are indicated in this file.