Question

Find transcription factors that may regulate list of genes

2

Entering edit mode

5.9 years ago

chris86 ▴ 400

Hi

I have a series of modules, comprised of lists of gene symbols. I want to have a look at transcription factors that may regulate each of these lists of genes, i.e. control the expression of. So far I have identified places where I can get lists of genes per transcription factor like from the ENCODE data, but I have noticed these lists tend to be very long and include 50% of the genome or even more than that. For example, either the GTRD database or this ENCODE list have this problem. I was planning on doing a series of hyper geometric tests to assess the overlap between each TF and my modules list. Can anyone tell me a better way of doing this, or if this is even a sensible approach? Is the quality of the results likely to be reliable. I am working with human diseaseed tissue expression data.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210645/

http://amp.pharm.mssm.edu/Harmonizome/dataset/ENCODE+Transcription+Factor+Targets

Thanks,

Chris

gene • 1.7k views

ADD COMMENT • link updated 5.9 years ago by Biojl ★ 1.7k • written 5.9 years ago by chris86 ▴ 400

score 0 · Answer 1 · 2018-05-28

0

Entering edit mode

5.9 years ago

lieven.sterck 15k

Perhaps not directly related to your specific question but you might have a look at these two software tools (given you also have expression data ) : ENIGMA and Lemone

ADD COMMENT • link 5.9 years ago by lieven.sterck 15k

0

Entering edit mode

Enigma was last updated in 2008. I would suggest using it.

ADD REPLY • link 5.1 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

True, they are not actively maintained anymore.

That does however not mean they can't be useful though ;) but I acknowledge they might indeed be under-performing

ADD REPLY • link 5.1 years ago by lieven.sterck 15k

score 0 · Answer 2 · 2018-05-28

What about calculating a Log odds ratio (LOR) using a pre-defined area around your genes of interest. Count the number of TFBS in those regions and compare it against the whole genome and/or other genes not related to the modules you want to test, to correct for the effect of having genes around your sequences.