Question

ChIP-Seq experiment, MACS2 bdgpeakcall 0 fold-change, pvalue and qvalue

0

Entering edit mode

6.5 years ago

Bastien Hervé 6.4k

Hello, I'm currently following this paper Myc Regulates Chromatin Decompaction and Nuclear Architecture during B Cell Activation.

It is a ChIP-Seq to generate genome-wide maps of 34 chromatin modifications (17 acetylation marks, and 17 methylation marks) and the histone variant H2AZ. B cells were either naïve (G0) or activated for 24h in the presence of LPS and IL-4.

I'm trying to isolate peaks specific to naïve or activated B cells in each chromatin modifications.

In the publication I found these informations about the data :

BAM files:

50bps reads were sequenced on HiSeq2000/2500 Illumina platform. After quality control, ChIP-Seq reads were aligned agaisnt mm9 mouse reference genome by using alignment software Bowtie (with options <=> report reads that align uniquely to the best stratum and allowing up to 2 mismatches).

BIGWIG files :

Density tracks were generated using custom software based on the samtools library to count the number of reads in 100 bp windows normalized to window size and library size to obtain densities in units of reads per kb region per million mapped reads (rpkm) across the genome.

So, I assume they reduced the noise background using input files to create density tracks which are bigwig files

Bigwig are available under GEO : GSE82144

I will focus on H3K9me3 chromatin modification for this example :

aB_wt_H3K9me3 : GSM2184251
rB_wt_H3K9me3 : GSM2184287

Recovering data :

axel -q ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM2184nnn/GSM2184251/suppl/GSM2184251_aB_wt_H3K9me3.bw
axel -q ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM2184nnn/GSM2184287/suppl/GSM2184287_rB_wt_H3K9me3.bw

Visualizing my bigwig under IGV, set data range from 0 to 10 for both files

Seems like, i could grab some peaks by eyes.

As I only got bigiwig files, I throught that MACS2 would be my best option, using bedgraph files

Transform bigwig to bedgraph : http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/

./bigWigToBedGraph GSM2184251_aB_wt_H3K9me3.bw GSM2184251_aB_wt_H3K9me3.bedgraph
./bigWigToBedGraph GSM2184287_rB_wt_H3K9me3.bw GSM2184287_rB_wt_H3K9me3.bedgraph

Install MACS2 :

conda create -n yourenvname python=2.7 anaconda
source activate python2
conda install -c bioconda macs2

Run MACS2 :

macs2 bdgpeakcall -i GSM2184251_aB_wt_H3K9me3.bedgraph -o GSM2184251_aB_wt_H3K9me3_narrowPeaks

INFO @ Fri, 21 Dec 2018 12:09:25: Read and build bedGraph...

INFO @ Fri, 21 Dec 2018 12:09:36: Call peaks from bedGraph...

INFO @ Fri, 21 Dec 2018 12:09:36: Write peaks...

INFO @ Fri, 21 Dec 2018 12:09:36: Done

macs2 bdgpeakcall -i GSM2184287_rB_wt_H3K9me3.bedgraph -o GSM2184287_rB_wt_H3K9me3_narrowPeaks

INFO @ Fri, 21 Dec 2018 12:10:54: Read and build bedGraph...

INFO @ Fri, 21 Dec 2018 12:11:06: Call peaks from bedGraph...

INFO @ Fri, 21 Dec 2018 12:11:07: Write peaks...

INFO @ Fri, 21 Dec 2018 12:11:07: Done

If I open these two files :

more GSM2184251_aB_wt_H3K9me3_narrowPeaks

track type=narrowPeak name="GSM2184251_aB_wt_H3K9me3_narrowPeaks" description="GSM2184251_aB_wt_H3K9me3_narrowPeaks" nextItemButton=on
chr1    3027700 3027900 GSM2184251_aB_wt_H3K9me3_narrowPeaks_narrowPeak1    53  .   0.00000 0.00000 0.00000 100
chr1    3434900 3435200 GSM2184251_aB_wt_H3K9me3_narrowPeaks_narrowPeak2    56  .   0.00000 0.00000 0.00000 150
chr1    3660600 3661100 GSM2184251_aB_wt_H3K9me3_narrowPeaks_narrowPeak3    123 .   0.00000 0.00000 0.00000 50
chr1    3661300 3661500 GSM2184251_aB_wt_H3K9me3_narrowPeaks_narrowPeak4    80  .   0.00000 0.00000 0.00000 150

more GSM2184287_rB_wt_H3K9me3_narrowPeaks

track type=narrowPeak name="GSM2184287_rB_wt_H3K9me3_narrowPeaks" description="GSM2184287_rB_wt_H3K9me3_narrowPeaks" nextItemButton=on
chr1    3010900 3011100 GSM2184287_rB_wt_H3K9me3_narrowPeaks_narrowPeak1    54  .   0.00000 0.00000 0.00000 100
chr1    3084200 3084400 GSM2184287_rB_wt_H3K9me3_narrowPeaks_narrowPeak2    76  .   0.00000 0.00000 0.00000 150
chr1    3175300 3175500 GSM2184287_rB_wt_H3K9me3_narrowPeaks_narrowPeak3    73  .   0.00000 0.00000 0.00000 50
chr1    3530700 3531000 GSM2184287_rB_wt_H3K9me3_narrowPeaks_narrowPeak4    50  .   0.00000 0.00000 0.00000 150

According to the documentation

NAME_peaks.narrowPeak is BED6+4 format file which contains the peak locations together with peak summit, pvalue and qvalue. You can load it to UCSC genome browser. Definition of some specific columns are:

5th: integer score for display calculated as int(-10*log10qvalue). Please note that currently this value might be out of the [0-1000] range defined in UCSC Encode narrowPeak format

7th: fold-change

8th: -log10pvalue

9th: -log10qvalue

10th: relative summit position to peak start

Why are my fold-change value, pvalue and qvalue set to 0.0 on every record ?

After that I want to filter the specific peaks to G0 and activated B cells

The macs2 bdgdiff should works but specifications required a MACS control lambda bedGraph file which is only available running macs2 callpeak option --bdg. This last does not accept bedgraph file so I am stuck...

macs2 callpeak accept bed files, so I tried :

macs2 callpeak -t GSM2184251_aB_wt_H3K9me3.bedgraph -c GSM2184287_rB_wt_H3K9me3.bedgraph -f BED --bdg --outdir callpeak
INFO  @ Fri, 21 Dec 2018 12:32:49: 
# Command line: callpeak -t GSM2184251_aB_wt_H3K9me3.bedgraph -c GSM2184287_rB_wt_H3K9me3.bedgraph -f BED --bdg --outdir callpeak
# ARGUMENTS LIST:
# name = NA
# format = BED
# ChIP-seq file = ['GSM2184251_aB_wt_H3K9me3.bedgraph']
# control file = ['GSM2184287_rB_wt_H3K9me3.bedgraph']
# effective genome size = 2.70e+09
# band width = 300
# model fold = [5, 50]
# qvalue cutoff = 5.00e-02
# The maximum gap between significant sites is assigned as the read length/tag size.
# The minimum length of peaks is assigned as the predicted fragment length "d".
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# Broad region calling is off
# Paired-End mode is off

INFO  @ Fri, 21 Dec 2018 12:32:49: #1 read tag files... 
INFO  @ Fri, 21 Dec 2018 12:32:49: #1 read treatment tags... 
INFO  @ Fri, 21 Dec 2018 12:32:50:  1000000 
INFO  @ Fri, 21 Dec 2018 12:32:52:  2000000 
INFO  @ Fri, 21 Dec 2018 12:32:53:  3000000 
INFO  @ Fri, 21 Dec 2018 12:32:55:  4000000 
INFO  @ Fri, 21 Dec 2018 12:32:56:  5000000 
INFO  @ Fri, 21 Dec 2018 12:32:58:  6000000 
INFO  @ Fri, 21 Dec 2018 12:32:59:  7000000 
INFO  @ Fri, 21 Dec 2018 12:33:01:  8000000 
INFO  @ Fri, 21 Dec 2018 12:33:02: #1.2 read input tags... 
INFO  @ Fri, 21 Dec 2018 12:33:04:  1000000 
INFO  @ Fri, 21 Dec 2018 12:33:05:  2000000 
INFO  @ Fri, 21 Dec 2018 12:33:07:  3000000 
INFO  @ Fri, 21 Dec 2018 12:33:08:  4000000 
INFO  @ Fri, 21 Dec 2018 12:33:10:  5000000 
INFO  @ Fri, 21 Dec 2018 12:33:11:  6000000 
INFO  @ Fri, 21 Dec 2018 12:33:13:  7000000 
INFO  @ Fri, 21 Dec 2018 12:33:14:  8000000 
INFO  @ Fri, 21 Dec 2018 12:33:15: #1 tag size is determined as 120 bps 
INFO  @ Fri, 21 Dec 2018 12:33:15: #1 tag size = 120.0 
INFO  @ Fri, 21 Dec 2018 12:33:15: #1  total tags in treatment: 8936534 
INFO  @ Fri, 21 Dec 2018 12:33:15: #1 user defined the maximum tags... 
INFO  @ Fri, 21 Dec 2018 12:33:15: #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s) 
INFO  @ Fri, 21 Dec 2018 12:33:16: #1  tags after filtering in treatment: 8936534 
INFO  @ Fri, 21 Dec 2018 12:33:16: #1  Redundant rate of treatment: 0.00 
INFO  @ Fri, 21 Dec 2018 12:33:16: #1  total tags in control: 8836601 
INFO  @ Fri, 21 Dec 2018 12:33:16: #1 user defined the maximum tags... 
INFO  @ Fri, 21 Dec 2018 12:33:16: #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s) 
INFO  @ Fri, 21 Dec 2018 12:33:16: #1  tags after filtering in control: 8836601 
INFO  @ Fri, 21 Dec 2018 12:33:16: #1  Redundant rate of control: 0.00 
INFO  @ Fri, 21 Dec 2018 12:33:16: #1 finished! 
INFO  @ Fri, 21 Dec 2018 12:33:16: #2 Build Peak Model... 
INFO  @ Fri, 21 Dec 2018 12:33:16: #2 looking for paired plus/minus strand peaks... 
INFO  @ Fri, 21 Dec 2018 12:33:19: #2 number of paired peaks: 0 
WARNING @ Fri, 21 Dec 2018 12:33:19: Too few paired peaks (0) so I can not build the model! Broader your MFOLD range parameter may erase this error. If it still can't build the model, we suggest to use --nomodel and --extsize 147 or other fixed number instead. 
WARNING @ Fri, 21 Dec 2018 12:33:19: Process for pairing-model is terminated!

Too few paired peaks (0)

I also tried --nomodel and --extsize 147 and I got the same result

I'm running in circles...

Thanks for your help

ChIP-Seq MACS2 • 6.1k views

ADD COMMENT • link updated 2.2 years ago by Benedetta • 0 • written 6.5 years ago by Bastien Hervé 6.4k

0

Entering edit mode

Hi! It's been a while but I'm stuck at the exact same point; did you find a solution after this?

ADD REPLY • link 2.2 years ago by Benedetta • 0

score 2 · Accepted Answer · 2018-12-21

2

Entering edit mode

6.5 years ago

ATpoint 88k

First of all, I recommend against using any uploaded bigwig/bedGraph files. Even though the method section states how they were generated, you can never be 100% sure what they contain and what filters were applied. Start from raw data, so fastq files, align to the reference (ause mm10, not mm9 which is deprecated) and then read the resulting BAM files (I suggest filtering for primary chromosomes, removing non-primary alignments and duplicates) into MACS2.

Furthermore, H3K9me3 is a diffuse or broad histone modification, see e.g. here, so using the default parameters is not a good choice. Better use the --broad when using callpeak or bdgbroadcall if you want to start from non-normalized bedGraphs.

Note that bedGraphs are technically BED files, but the BED files that callpeak accepts are files that contain sequencing reads. A bedGraph is a summary of the number of reads across the genome, so t does not make sense to read a bedGraph into callpeak. Use BAM files for this. Also, option --bdg only makes macs save the reads read from a BED/BAM to disk in bedGraoh format but has no effect on peak calling.

ADD COMMENT • link 6.5 years ago by ATpoint 88k

0

Entering edit mode

Thanks ATpoint

First section : If I had the choice between bigwig and raw reads I would have go for raw reads of course. Still, I don't have this information, so I followed theirs specs (mm9...)

Second section : Sorry I missed this info in my thread, I wanting to create a simple example with a sharp histone mark but i tangled up with my histone marks.

bedgraphs should be normalize

Density tracks were generated using custom software based on the samtools library to count the number of reads in 100 bp windows normalized to window size and library size to obtain densities in units of reads per kb region per million mapped reads (rpkm) across the genome. Ofc broad options for broad histon marks

Third section :

Use BAM files for this.

Well if I could...

I contacted the first author to get more information about the bioinformatic workflow

ADD REPLY • link 6.5 years ago by Bastien Hervé 6.4k

1

Entering edit mode

Why don't you download them from the ENA? They are all there directly in fastq format. Here is a tutorial for fast batch downloads: Fast download of FASTQ files and metadata from the European Nucleotide Archive (ENA) I think even though this will take a bit of time to align, it gives you much better results because I do not think that RPKM-normalized values in 100bp bins meet the assumptions of MACS, especially in terms of accurate summit detection.

ADD REPLY • link 6.5 years ago by ATpoint 88k

0

Entering edit mode

I think they did not upload the corresponding input files on ENA.

For example H3K9me3, I only got :

SRS1478384 : rB_wt_H3K9me3
SRS1478348 : aB_wt_H3K9me3

ADD REPLY • link 6.5 years ago by Bastien Hervé 6.4k

score 1 · Accepted Answer · 2018-12-21

To close this thread, I contacted the authors, here is the reply :

No, I didn’t do noise background reduction in bw file generation.

I used input file for peak call, but I didn’t use input files for generating bw files.

Unfortunately, I didn’t upload input files to GEO.

If you want, I can send input files for rB and aB.

So, if I can't redo the analysis because input files are not uploaded and neither I can't process to peak calling because bigwig files are not input reduced. I don't know what I could do to use the data in this paper...

Anyway, thanks ATpoint