Question: Choosing a ChIP-seq control for Encode pipeline?
0
gravatar for gewa
8 months ago by
gewa0
gewa0 wrote:

Hi, I have a fastq file from a TF targeted ChIP-seq run, obtained from GEO here: https://www.ncbi.nlm.nih.gov/sra?term=SRX011616

I need to process this data and do peak-calling for some downstream stuff, and as I am using it in tandem with other data that the Encode project has already processed, I would like to use their pipeline for quality control, alignment and peak calling. I saw they have their pipeline in a GUI on DNAnexus and was planning on using that; however I noticed that all of their ChIP-seq pipelines need a control fastq file, which my experiment doesn't seem to have. I saw that there are several control chip-seq assays availible on Encode for my cell line (human h1) -- could I use one of these, and if so, how would I go about choosing one? Also, I noticed that there are several different ChIP-seq pipelines for transcription factors on encode - Unary control, unreplicated is I think what I want as I only have one run, but I wanted to double check.

Thanks so much for any help!

chip-seq encode • 409 views
ADD COMMENTlink modified 8 months ago by Alex Reynolds28k • written 8 months ago by gewa0
1
gravatar for Alex Reynolds
8 months ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

When you browse the matrix view, you should have the option to select the output format (FASTQ), along with the assay (ChIP-seq), the target (transcription factors), and genome assembly (e.g., hg38):

ENCODE matrix view

You can further drill down by selecting subsets of tissues or cell lines of interest, etc.

Click on the "View results as list" button (it is the leftmost button underneath the "1858 results" string).

Click on the Download button and follow the instructions. The files.txt file contains a link to a metadata table (tab-delimited) and URLs for results of interest.

You may to first download the metadata table and parse it for files of interest. Namely, you might filter for the TF targets you are interested in, as well as filtering out any records with audit or QC information that suggests you should skip that dataset.

Once you have filtered that metadata table, you can use the list of accession IDs in what's left to go back to files.txt and start downloading stuff.

Mainly, you'll probably want to dig into the metadata to determine what accession records you want to download.

ADD COMMENTlink modified 8 months ago • written 8 months ago by Alex Reynolds28k

Thanks for your response! I have actually already obtained the actual target run, from outside encode (in the link in my original post). It is a ChIP-seq on human H1, with SOX2 as the target (TF). I'm actually trying to find a control to use with it, and was wondering if I could get it from encode - should my target of assay still be "Transcription Factor", or should it be "control"? Also, I have noticed that when I narrow down my search to human H1 ChIP-seq, with "control" as the target, there are several studies available - I was wondering how I could choose between these. Also, I'm not sure if it's even advisible to use a control run that's not from the exact same experiment, but I can't seem to find the control they used in the experiment I'm using (linked in my OP). I can't find an experiment on Encode that targets the TF I need (Sox2) on the cell line I need (H1, hg38), which is why I'm in this situation. Thanks again for any help!

ADD REPLYlink modified 8 months ago • written 8 months ago by gewa0
1

I don't see a SOX2 target in there, either, so I think you'll need to look elsewhere for a control.

ADD REPLYlink written 8 months ago by Alex Reynolds28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1061 users visited in the last hour