Question: ChIP-seq analysis guidance for a beginner
gravatar for DM95
1 day ago by
DM950 wrote:


I have probably a very simple question, but I need some help exploring my ChIP-data. I have three different sample: 1. untreated 2. treatment with stimulant 3. treatment with inhibitor

First thing I want to know is where do I see increased binding in 2 compared to 1. From this I want to obtain a list of genes where binding of protein of interest is increased in a new file. Then I want to use that file to see where binding is decreased after 3 to find out at which genes binding is decreased after treatment with the inhibitor.

The idea of the experiment is that we treat the cells with a stimulant that induces protein binding and then we follow the stimulant with an inhibitor to see where protein binding is decreased due to the inhibitor and have a list of genes where the protein binding is not disturbed by the inhibitor vs where it is disturbed.

Does anyone have some guidance how to approach this? I am a complete novice when it comes to bioinformatics and I could use some pointers.

Thank you!

ADD COMMENTlink modified 1 day ago by jared.andrews076.1k • written 1 day ago by DM950

have you already obtained the sequences ?. I mean the places your protein bind

ADD REPLYlink written 1 day ago by Antonio R. Franco4.5k

I am sorry I should have been more clear. I have been given bigWig files for each of the samples

ADD REPLYlink written 1 day ago by DM950

Do you have experimental replicates?

ADD REPLYlink written 1 day ago by ATpoint36k

Yes, two replicates for each sample

ADD REPLYlink written 1 day ago by DM950
gravatar for jared.andrews07
1 day ago by
St. Louis, MO
jared.andrews076.1k wrote:

How many replicates for each condition do you have?

Generally, the ChIP-seq pipeline goes:

  • Sequencing QC (FastQC)
  • Alignment (many options here)
  • Peak calling & peak annotation (MACS2, ZINBA, lots of peak callers out there. chipSeeker is popular for peak annotations, though it takes a simple approach. Some differential binding approaches, like csaw, don't require peaks at all.)
  • IP QC (ChIPQC)
  • Differential binding analysis (csaw or DiffBind)
  • Other downstream analyses & visualizations - motif analyses, enrichment analyses, etc etc.

Here is a workflow with helpful context for each step that uses R packages from Bioconductor to perform an end-to-end analysis. It is likely worth your time even if you don't use all of those packages, as it will explain various steps and what you should expect to see during QC.

ADD COMMENTlink modified 1 day ago • written 1 day ago by jared.andrews076.1k

There are 2 replicates for each conditions. All the files I have are in bigWig format.

ADD REPLYlink written 1 day ago by DM950

Okay, replicates are good.

Then your first step is to get the original data in FASTQ (or at least BAM) format. bigWig files have already been processed and are meant for visualization purposes. They are typically scaled, but are not directly comparable to one another in most cases. This is especially true when you don't know how they've been generated. You will not be able to perform the analyses you want with only those files.

ADD REPLYlink written 1 day ago by jared.andrews076.1k

Ok! Thank you I will work on that. Once I have obtained those files how would I proceed to do the analysis I want? Also, can anything be inferred from bigWig files, like statistics?

ADD REPLYlink written 1 day ago by DM950

Depends. If they're FASTQ files, you'd want to do some QC to ensure the sequencing worked properly. Then you'd move on to the alignment and additional steps as I list above. If they're BAM files, you can also run them through FastQC, but would be able to skip the alignment step, assuming whoever did the alignment used both an aligner and parameters that make sense. You should try to get that information from whoever dealt with the data initially if that's the case. I've edited my answer to include a link to an end-to-end workflow (with code) that should help you get started. It is Bioconductor-centric, but still contains lots of useful info even if you don't use those packages. It also goes through alignment and some QC at the end of the article.

Stats on the bigWigs are a lost cause for the most part. You can look at them in a genome browser IGV just to spot check that your ChIP actually worked. Pick a gene you know should have binding and ensure that you can visually see read pileups in each of your samples.

ADD REPLYlink written 1 day ago by jared.andrews076.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1024 users visited in the last hour