Question: Doubts related to MACS tool
2
gravatar for pinky_pinkpinky
3.9 years ago by
India
pinky_pinkpinky50 wrote:

Dear Users,

I have few doubts using MACS2

Peakcalling
1. macs2 callpeak -t chip.bedgraph -c input.bedgraph --outdir Input_test -B --nomodel --SPMR

This step generates a control lambda.bdg file and a treated pileup bdg file.
 
As per my understanding --SPMR normalizes the dataset to 1M reads . So , if it is normalizes, can we
convert this data to bigwig file and then visualize the control and treated sample? Does it make sense?
2. macs2 bdgcmp -t treated_pileup.bdg -c control_lambda.bdg -m logLR --outdir Ba-HW-bdgcmp -o out.bdg

This command is used to remove background noise between the control and treated set. If this is the case what step (1) does? Is it not normalized in step1? If so what the use of --SPMR.

More to my confusion bdgcmp has --scaling factor as an option. To my understanding this is also used for normalization. Again if I want to use logLR for bdgcmp i have to provide -p parameter., but it says this parameter is applied after normalization of sequencing depth. How should i do? I am unable to understand. Please guide me.

Questions

1. Which step does normalization, --SPMR or scaling factor. If scaling factor, how to estimate the value?
2. Which files should i take for comparing the chipped and input data. callpeak generates .bdg for control and treated and after that bdgcmp generates one more .bdg file. which files should be considered for visualization using IGV.
3. If I want to provide -m logLR during bgdcmp , I have to provide -p also. Inorder to use -p, the data has to be normalized to sequencing depth?

I am sorry, I am quite new to macs2 so have lot of confusion.

Your inputs will be highly appreciated.

Thank You,

Pinky

chip-seq • 3.3k views
ADD COMMENTlink modified 3.8 years ago • written 3.9 years ago by pinky_pinkpinky50
0
gravatar for Ian
3.9 years ago by
Ian5.5k
University of Manchester, UK
Ian5.5k wrote:

For point 1 --SPMR is only used in conjunction with --bdg.  The counts in the bedGraph file are then normalised based on the millions of reads/fragments in the ChIP sample; after deduplication etc.  E.g. 12 fragments / 20 (million fragments in ChIP).  you can then convert bedGraphToBigWig using the UCSC tool of the same name.

I don't have experience of bdgcmp.

 

ADD COMMENTlink written 3.9 years ago by Ian5.5k
0
gravatar for pinky_pinkpinky
3.9 years ago by
India
pinky_pinkpinky50 wrote:

Thanks Ian. I am not so confident with macs2 version. I have few fundamental doubts.

It will be great if you can clarify them.

1. In macs2 output file NA_peaks.xlsx, what does pileup means? Does NA_peaks.xlsx gives only the enriched regions in treated sample?

In manual it says , its the pileup height at peak summit. That means the number of reads aligned to that peak region. Is it so?

2. what is the use of NA_peaks.narrowPeak file. This file contains the equal number of peaks as generated in NA_peaks.xlsx. So what is the purpose?

3.In my results i could not generate ‘NAME_negative_peaks.xls' file. Below are the parameters which i used for peak calling.

 callpeak -t treated.bed -c control.bed --outdir output -B --nomodel --SPMR -q 0.01.

Your inputs are highly appreciated.

Thanks

Pinky

 

ADD COMMENTlink written 3.9 years ago by pinky_pinkpinky50

Sorry i missed your reply, but Pierre seems to have answered.

ADD REPLYlink written 3.8 years ago by Ian5.5k
0
gravatar for Alternative
3.9 years ago by
Alternative230
Alternative230 wrote:

There are two different things: "Sample normalization" and "Noise deduction".

1) you can normalize any library you sequenced by Reads Per Million. This is what the SPMR is for. When this option is specified, you will have that done on your "treatment" and "Control". You can then transform both to "bigwig" and visualize them

2) In addition to that, and for visualization purposes and some metadata analysis, you can "subtract" or "divide" or "log2(Treatment/Ctr)" your Treatment over your control. This is to deduct noise. This is what bgdcmp  allows you to do. You can do that directly on your bam files, without calling peaks, and this is why you have the option that allows you to do so. For instance, you can calculate the RPM fraction yourself and give it to bdgcmp. For instance, to show signal on your treatment sample, you can decide to show "normalized signal with input subtracted" which means that both treatment and control are normalized to RPM and input is subtracted from the treatment,

3) "-P" is given because there are different ways of of normalization. You can normalize by RPM, RPGC ...

4) NA_peaks.narrowPeak is the same as xls but in narrowPeak format. This is what bioinformatician operate on. More on narrowPeak format is on https://genome.ucsc.edu/FAQ/FAQformat.html#format12

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Alternative230

Dear Pierre,

Thank You for your explanation.

I would like to go one step back and would like get few more inputs.

1. You discussed about sample normalization using SPMR option. Is it supposed to be done separately

for input and IP or  following the below mentioned script takes care of it ?

callpeak -t treated.bed -c control.bed --outdir output -B --nomodel --SPMR -q 0.01

What does the 4th column of *_control_lambda.bdg and *_treat_pileup.bdg  means in MACS2 output?

Ans) Is it the fold enrichment. If so how it is calculated?

2. My control libarary has ~16M reads and treated has 6M reads. How does it affects in MACS2 pipeline?

To my understanding, data is scaled as per the smaller library.

3. How does the control__lambda.bw (bigwig file) different from bam file.

Ans) Is it that.bw file gives only a portion of the region that is enriched whereas bam gives the complete

alignment coverage across the genome.

4) Which files to be considered for visualization the sorted bam files or the bigwig files.

Your answers will be highly appreciated.

Thanks

 

 

 

 

ADD REPLYlink written 3.8 years ago by pinky_pinkpinky50

sorry for the late reply. Here are the answers regarding the different points: 1) yes, it is supposed to be done separately. When you call peaks with the SPMR option, MACS will generate two bedgraph files (that you can convert to the better bigwig format), one for your treatment and one for your input. Both files will be "SPMR" normalized. The 4th colum of ".bdg" files represent the signal, after normalization in your case. Read about it here "https://genome.ucsc.edu/goldenpath/help/bedgraph.html"

2) Yes, it is scalled. MACS documentation explains that

3) bigwig files are signal files. Bam files are alignment files. When loaded into a genome browser (i.e IGV), both are supposed to show the similar trend. Bigwig file though will be smaller (compressed) and eventually contains the normalized score depending on the normalization that you applied. ".bw" files give the view on the whole genome, unless if you generated them on a portion of the genome (some programs allow that, like deeptools, ...)

4) for visualization, the bigwig files are better since they are normalized (unless you did not apply any normalization), faster to load, faster to exchange. Sometimes though, we do look at the bam files too. It depends on what you want to look at.

Best and hope this will help,

Pierre

ADD REPLYlink written 3.7 years ago by Alternative230

Hi Pierre,

Thanks very much for your detail explanation. For point 3) "-P" is given because there are different ways of of normalization. You can normalize by RPM, RPGC ...", what should be set for the "-P" if I want to do the normalization by RPGC?

Kylie

ADD REPLYlink modified 22 months ago • written 22 months ago by chiefcat130
0
gravatar for pinky_pinkpinky
3.8 years ago by
India
pinky_pinkpinky50 wrote:

Thank You for your explanation.

I would like to go one step back and would like get few more inputs.

1. You discussed about sample normalization using SPMR option. Is it supposed to be done separately

for input and IP or  following the below mentioned script takes care of it ?

callpeak -t treated.bed -c control.bed --outdir output -B --nomodel --SPMR -q 0.01

What does the 4th column of *_control_lambda.bdg and *_treat_pileup.bdg  means in MACS2 output?

Ans) Is it the fold enrichment. If so how it is calculated?

2. My control libarary has ~16M reads and treated has 6M reads. How does it affects in MACS2 pipeline?

To my understanding, data is scaled as per the smaller library.

3. How does the control__lambda.bw (bigwig file) different from bam file.

Ans) Is it that.bw file gives only a portion of the region that is enriched whereas bam gives the complete

alignment coverage across the genome.

4) Which files to be considered for visualization the sorted bam files or the bigwig files.

Your answers will be highly appreciated.

Thanks

ADD COMMENTlink written 3.8 years ago by pinky_pinkpinky50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1250 users visited in the last hour