Nomalize chip-seq data
2
0
Entering edit mode
7.1 years ago
Picoskia • 0

Hi everyone,

I have to analyze 3 chip-seq datasets. I'm fine with the analysis procedure itself but have a question about the normalization. At which step(s) should i normalize my datas?

  • Before the alignment by sequencing depth?
  • At the peak calling step with MACS?
  • At the conversion step from BAM to BigWig?

How do you usually normalize your chip seq datas?

Thanks a lot for your help.

ChIP-Seq • 3.8k views
ADD COMMENT
0
Entering edit mode

Is your 3 ChIPseq dataset for different factor/histone or same factor/histone in different condition ?

ADD REPLY
0
Entering edit mode

The three ChIPseq data are the same factor : one WT and two with mutations

ADD REPLY
0
Entering edit mode

MACS includes some basic normalization in case you provide a control file. In that case, the larger file is proportionally scaled towards the smaller one. If your goal is [and this is what you should state right away when posting a question like this] to simply call peaks, this is typically sufficient. If you aim to perform differential analysis, have a look at the established tools, like MAnorm, csaw, DiffBind and many more.

ADD REPLY
1
Entering edit mode
7.1 years ago

MACS normalizes when calling peaks fairly well, though their normalized bedgraphs frankly look terrible on a browser. I use deepTools to create read-depth normalized bigWig files that look much more appropriate in UCSC. deepTools has a few different ways it can normalize, including subtracting input reads from samples, though I typically just use the rpkm option.

If you want to quantitatively compare signal at ChIP-seq peaks, my two favorite tools are DiffBind (R package) if you have biological replicates or MAnorm (Bash/R scripts) if you're trying to compare a single sample to another. They both take care of normalization and do a pretty good job of identifying unique peaks for a given condition/sample.

ADD COMMENT
0
Entering edit mode

can you provide some link where i can read about both chip seq and ATAC seq data analysis I did use homer but im not yet confident about it

ADD REPLY
0
Entering edit mode

I typically treat ATAC-seq much the same as ChIP-seq, but use a smaller extension size during peak calling for ATAC-seq, as our fragments are usually smaller. HOMER is also a perfectly good tool (with great documentation), though it can't quantitatively compare signal at peaks last I checked. I found this paper very helpful when trying to identify which tool is best for the job depending on your data type (sharp vs broad signal), if you have replicates, etc.

There are tons of other blogs/githubs/websites that go more deeply into analysis, including the BioStars handbook. This github also has links and some comments about pretty much every tool ever developed for ChIP-seq analysis along with tons of links to other resources, key papers, etc. It's a great resource.

ADD REPLY
0
Entering edit mode

ENCODE has pipelines and documentation for this:

https://github.com/mforde84/ATACseq-analysis-pipeline

ADD REPLY
0
Entering edit mode
7.1 years ago
rahul • 0

As suggested, MACS/MACS2 will normalize according to the total number of reads. Some of the bigWig creation packages also have the ability to scale by a specified normalization factor, which you will have to do to get a "normalized" bigWig file.

One last thing: if you are looking at a global increase or reduction of whatever you are ChIPping, total read normalization will not work. Something to keep in mind...

ADD COMMENT
0
Entering edit mode

Thank you for your answers, so if i understand the normalization step has to be done after the alignment when calling peaks with MACS.

ADD REPLY
0
Entering edit mode

if you are looking at a global increase or reduction of whatever you are ChIPping, total read normalization will not work.

This depends on the nature of the ChIP. Transcription factor ChIP-seq often have relatively few (<20K) enriched regions, which should not influence the global scaling approaches too much. Broad histone marks covering large swathes of the genome (e.g. K27me3) can be a different story, though.

ADD REPLY

Login before adding your answer.

Traffic: 1174 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6