Question

Data Mining for Chip_seq analysis

0

Entering edit mode

19 days ago

tabasumughal • 0

I have planned a project to call the super-enhancer from different stages of prostate cancer (normal, primary and metastatic) using cell line data. But the problem is that when I try to find the datasets from NCBI-GEO, ENSEMBLE or CISTROME, I can not find the correct data for ChIP-seq, like some datasets have only one replicate, some datasets do not have input for the sample, etc. Could anyone suggest the best way to find the data?

Cancer Chip_seq Prostate • 535 views

ADD COMMENT • link updated 14 hours ago by Kevin Blighe 89k • written 19 days ago by tabasumughal • 0

0

Entering edit mode

Well, genomics projects are not an online warehouse with overnight delivery where you can just order exactly what you need. Either you need to create the data yourself for the appropriate setup or live with what you have. You would browse the literature since published datasets, as the name suggests, have been published before, and then download the data. Keep in mind that blindly collecting data from different sources has limited values since batch effects hinder direct comparisons. So just calling your SEs, even from the same cellline across different studies will probably not give much. Think about with your supervisor what the biological question is and how this could be tackled best.

ADD REPLY • link 19 days ago by ATpoint 90k

0

Entering edit mode

Alright, I understand what you said... thank you very much for your kind reply.

ADD REPLY • link 19 days ago by tabasumughal • 0

score 0 · Answer 1 · 2025-11-07

Hi Tabasum,

I agree with the previous comment—genomics data rarely comes gift-wrapped, and batch effects can scupper direct comparisons across studies. That said, for H3K27ac ChIP-seq in prostate cell lines (key for super-enhancer calling via ROSE or similar), try these targeted steps to unearth usable datasets:

First, scour PubMed for reviews like "prostate cancer epigenome" or "super-enhancers in prostate cancer progression"—they often list GEO accessions with decent quality (e.g., GSE78251 for LNCaP/PC-3 lines, which has inputs and replicates). Cross-check in Cistrome DB or ReMap for aggregated tracks, filtering by cell line and mark.

Specific lines to hunt: RWPE-1 (normal), LNCaP/VCaP (primary/androgen-dependent), PC-3/DU145 (metastatic/castration-resistant). In GEO, query: "H3K27ac" AND prostate AND (ChIP-seq OR "histone acetylation") AND replicate[All Fields]—this biases towards multi-rep sets. For singles without input, spike-in normalisation or blacklists can salvage them, but flag limitations in your analysis.

If still short, email corresponding authors (many share extras) or pivot to TCGA-derived calls via dbSUPER. Discuss with your supervisor whether generating your own data beats patching together imperfect ones.

Kevin