Dear Users,
I have few doubts using MACS2
Peakcalling
macs2 callpeak -t chip.bedgraph -c input.bedgraph --outdir Input_test -B --nomodel --SPMR
This step generates a control lambda.bdg file and a treated pileup bdg file.
As per my understanding --SPMR
normalizes the dataset to 1M reads . So , if it is normalizes, can we convert this data to bigwig file and then visualize the control and treated sample? Does it make sense?
macs2 bdgcmp -t treated_pileup.bdg -c control_lambda.bdg -m logLR --outdir Ba-HW-bdgcmp -o out.bdg
This command is used to remove background noise between the control and treated set. If this is the case what step (1) does? Is it not normalized in step1? If so what the use of --SPMR
.
More to my confusion bdgcmp has --scaling factor as an option. To my understanding this is also used for normalization. Again if I want to use logLR for bdgcmp I have to provide -p parameter., but it says this parameter is applied after normalization of sequencing depth. How should I do? I am unable to understand. Please guide me.
Questions
- Which step does normalization,
--SPMR
or scaling factor. If scaling factor, how to estimate the value? - Which files should I take for comparing the chipped and input data. callpeak generates .bdg for control and treated and after that bdgcmp generates one more .bdg file. which files should be considered for visualization using IGV.
- If I want to provide
-m logLR
during bgdcmp, I have to provide-p
also. In order to use-p
, the data has to be normalized to sequencing depth?
I am sorry, I am quite new to macs2 so have lot of confusion.
Your inputs will be highly appreciated.
Thank You,
Pinky
Thanks Ian. I am not so confident with macs2 version. I have few fundamental doubts.
It will be great if you can clarify them.
In macs2 output file
NA_peaks.xlsx
, what does pileup means? DoesNA_peaks.xlsx
gives only the enriched regions in treated sample?In manual it says , its the pileup height at peak summit. That means the number of reads aligned to that peak region. Is it so?
what is the use of
NA_peaks.narrowPeak
file. This file contains the equal number of peaks as generated inNA_peaks.xlsx
. So what is the purpose?In my results I could not generate
NAME_negative_peaks.xls
file. Below are the parameters which I used for peak calling.Your inputs are highly appreciated.
Thanks
Pinky
Sorry i missed your reply, but Pierre seems to have answered.
Thank You for your explanation.
I would like to go one step back and would like get few more inputs.
You discussed about sample normalization using SPMR option. Is it supposed to be done separately for input and IP or following the below mentioned script takes care of it?
What does the 4th column of
*_control_lambda.bdg
and*_treat_pileup.bdg
means in MACS2 output?Ans) Is it the fold enrichment. If so how it is calculated?
My control libarary has ~16M reads and treated has 6M reads. How does it affects in MACS2 pipeline?
To my understanding, data is scaled as per the smaller library.
How does the control__lambda.bw (bigwig file) different from bam file.
Ans) Is it that.bw file gives only a portion of the region that is enriched whereas bam gives the complete alignment coverage across the genome.
Which files to be considered for visualization the sorted bam files or the bigwig files.
Your answers will be highly appreciated.
Thanks