I'm having trouble tracking down a chromatin database. I'm trying to get a list of Gene names so I can check the efficiency of a wet-lab experiment a post-doc in our laboratory designed for chromatin enrichment.
ChromDB appears to have been abandoned, as the last time it was updated was 2015. Furthermore, the link to the senior programmer is defunct, the PI is emeritus and I get a 404 error trying to download a FASTA file.
3CDB, the Chromosome Conformation Capture Database, run by Beijing Institute of Genomics (BGI) was updated January 2016 but I get an error when I try to download the chromosome FASTA. I also get a bounced e-mail when I try to contact them.
cisRED a chromatin motif database is also giving me a 503 error.
Surely there has to be a more comprehensive database somewhere, I just don't know where to find it.
Your help would be greatly appreciated.
EDIT TO INCLUDE MORE INFORMATION/BACKGROUND
A post-doc, along with a collaborator, have developed a wet-lab enrichment for chromatin using LC-MS. I'm trying to compare these chromatome pull-downs to whole proteome experiments to investigate what proportion, relative to all proteins, is nuclear proteins to see how well these assays have worked.
First, I had the idea of exploiting microscopy, specifically immunohistochemistry, to annotate all proteins with their known subcellular localization. I used data from The Human Protein Atlas to annotate both the chromatome samples and whole cell samples to see how efficient the assay was at getting nuclear proteins and also t o check if there was any evidence that the chromatin preparation also contains other fractions (mitochondria, ER, lysosomes, etc.). I counted spectra counts >0 for each sample (number of proteins present), for each cell compartment, and compared this to the total number of proteins in that sample to get the relative proportions; this was done because the chromatome assays were 1D-shotgun 1 fraction and the whole cell assays were 2D-shotgun 50 fraction (higher depth).
Here is a figure of Relative proportion http://tinypic.com/r/2gtaqsy/9. The whole cell experiment are listed as .core (e.g. HAP1_P5242.core compared to all other HAP1 chromatin pull-downs). Based on the figure it appears there is at best a minimal enrichment of nuclear proteins in the chromatome assays compared to the whole cell proteome. This is not what we saw for selected protein by western blot. This either means all chromatome sample preps for mass-spec did not work (unlikely) or that the metric I used was not ideal. I thought that perhaps this is because I was simply counting presence of proteins and not taking the actual abundance of spectral counts into account.
Therefor, I tried to come up with a better scoring system, I have attached a "back-of-the-envelope" calculation for my proposed scoring system http://tinypic.com/r/2vnkpvt/9. Briefly 1) weigh spectral counts by the quality of annotation 2) penalized proteins which have evidence of being present in multiple cell compartments (i.e. give a higher weight to those only found in the nucleus) to come up with a pseudocount. I took the sum of these pseudocounts for a subcellular location and divided by the total sum of all pseudocounts for every protein in the sample got a rather disappointing result http://tinypic.com/r/105r7s2/9. (Proportion is greater than 100 because some proteins are found in multiple subcellular compartments).
The post-doc did a nuclear versus cytosolic fractionation of various cell lines (using RCC1 as a nuclear control and tubulin as the cytoplasmic control) and she gave me a list of gene names from the nuclear fractionation. I compared how many of these were found compared to the total number of proteins in the sample to get this figure http://tinypic.com/r/v5hki0/9. Along the x-axis the percentage. It looks a bit better (at least for the K562 and HAP1 samples) but I would have still expected to be seeing that the chromatome pull-downs worked better.
RCC1 is not always bound to chromatin but rather stays in the nucleus "floating" and associates with chromatin only when needed (during mitosis I believe). Perhaps I'm not seeing a clear enrichment of nuclear proteins in the chromatome pull downs because it is not a nuclear extraction but a chromatin extraction (the insoluble part). So ideally what I would like is a database of common chromatin associated proteins (readers, writers, erasers) to re-run this analysis. I would just need a list of official gene symbols (like the ones required in DAVID -step 2 "select identifier). These could include predictive chromatin features such as histones (both canonical or variant histones), regulators such as the SWI/SNF chromatin remodeler complex and associated proteins, TFs, etc. When you mention "some sort of chromatin feature (which one)" I supposed it would useful to have group chromatin features into different categories according to their known function in transcriptional regulation. We are interested in finding de-novo features in the nucleus, in addition to known chromatin features, so I would like to cast a wide-net.
When you say finding gene names is easy I guess I just don't really know where to look; there is a wide range of ENCODE or ROADMAP datasets measured by different sequencing techniques (ChIP-seq, CAGE, RNA-PET, RNA-Seq, ATAC-Seq, ChIA-PET, Hi-C, etc.) and it's a bit overwhelming where to find exactly what I need.
My previous approaches have been somewhat less than eloquent and perhaps someone can suggest a more sophisticated technique/analysis for determining how to assess the efficacy of these chromatome pull-downs and ultimately how to develop a cutoff that will distinguish real vs. background binding.