Data Mining for Chip_seq analysis
1
0
Entering edit mode
19 days ago

I have planned a project to call the super-enhancer from different stages of prostate cancer (normal, primary and metastatic) using cell line data. But the problem is that when I try to find the datasets from NCBI-GEO, ENSEMBLE or CISTROME, I can not find the correct data for ChIP-seq, like some datasets have only one replicate, some datasets do not have input for the sample, etc. Could anyone suggest the best way to find the data?

Cancer Chip_seq Prostate • 535 views
ADD COMMENT
0
Entering edit mode

Well, genomics projects are not an online warehouse with overnight delivery where you can just order exactly what you need. Either you need to create the data yourself for the appropriate setup or live with what you have. You would browse the literature since published datasets, as the name suggests, have been published before, and then download the data. Keep in mind that blindly collecting data from different sources has limited values since batch effects hinder direct comparisons. So just calling your SEs, even from the same cellline across different studies will probably not give much. Think about with your supervisor what the biological question is and how this could be tackled best.

ADD REPLY
0
Entering edit mode

Alright, I understand what you said... thank you very much for your kind reply.

ADD REPLY
0
Entering edit mode
14 hours ago

Hi Tabasum,

I agree with the previous comment—genomics data rarely comes gift-wrapped, and batch effects can scupper direct comparisons across studies. That said, for H3K27ac ChIP-seq in prostate cell lines (key for super-enhancer calling via ROSE or similar), try these targeted steps to unearth usable datasets:

First, scour PubMed for reviews like "prostate cancer epigenome" or "super-enhancers in prostate cancer progression"—they often list GEO accessions with decent quality (e.g., GSE78251 for LNCaP/PC-3 lines, which has inputs and replicates). Cross-check in Cistrome DB or ReMap for aggregated tracks, filtering by cell line and mark.

Specific lines to hunt: RWPE-1 (normal), LNCaP/VCaP (primary/androgen-dependent), PC-3/DU145 (metastatic/castration-resistant). In GEO, query: "H3K27ac" AND prostate AND (ChIP-seq OR "histone acetylation") AND replicate[All Fields]—this biases towards multi-rep sets. For singles without input, spike-in normalisation or blacklists can salvage them, but flag limitations in your analysis.

If still short, email corresponding authors (many share extras) or pivot to TCGA-derived calls via dbSUPER. Discuss with your supervisor whether generating your own data beats patching together imperfect ones.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 3968 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6