My goal is to identify the target genes of several transcription factors.
I'd like to work with a dataset derived from ENCODE chip-seq analysis (it can be found here: http://ilab.jhsph.edu/database/dataset/HumanRank.tar.gz), where the peaks have already been mapped, etc.
For each transcription factor, there's a file with all the targets reported in the chip-seq experiment. All these targets have been ranked according to some kind of score (ChIPXpressScore). This is the head of one of these files (targets of EP300):
Rank GeneNames EntrezID ChIPXpressScore
1 FBXO33 254170 10.9
2 TCP11L2 255394 34.8
3 UPF2 26019 38.7
The problem is that they identify between 1000 and 10000 targets for each transcription factor (1% IDR). I've heard that this is normal in Chip experiments. What does the people do then? Pick only the top genes on the list? How many of them?
Thanks in advance!