criteria for choosing proper parameters for MASC2 -- chip-seq ?
3
0
Entering edit mode
7.8 years ago
jimmy_zeng ▴ 90

Hi, All:

I'm now tring to using MACS2 call peaks for CHIP-seq data. But most of the information for this software are base on MACS1.4 or lower.

So, I just do a small test for it, and it runs very fast, I have enough time and storage to try it. So, the question is : Is there any criteria for choosing proper parameters for MASC2,

ARGUMENTS LIST:

name = Xu_MUT_rep1

format = SAM

ChIP-seq file = ['SRR1042593.sam']

control file = ['SRR1042594.sam']

effective genome size = 2.70e+09

band width = 300

model fold = [5, 50]

qvalue cutoff = 5.00e-02

Larger dataset will be scaled towards smaller dataset.

Range for calculating regional lambda is: 1000 bps and 10000 bps

Broad region calling is off

Paired-End mode is off

For the code below, I got less than 100 peaks, that I don't know what's wrong with the parameters.

nohup time ~/.local/bin/macs2 callpeak -c SRR1042594.sorted.bam -t SRR1042593.sorted.bam -f BAM -B -g hs -n Xu_MUT_rep1 2>Xu_MUT_rep1.masc2.log &
nohup time ~/.local/bin/macs2 callpeak -c SRR1042596.sorted.bam -t SRR1042595.sorted.bam -f BAM -B -g hs -n Xu_MUT_rep2 2>Xu_MUT_rep2.masc2.log &
nohup time ~/.local/bin/macs2 callpeak -c SRR1042598.sorted.bam -t SRR1042597.sorted.bam -f BAM -B -g hs -n Xu_WT_rep1  2>Xu_WT_rep1.masc2.log &
nohup time ~/.local/bin/macs2 callpeak -c SRR1042600.sorted.bam -t SRR1042599.sorted.bam -f BAM -B -g hs -n Xu_WT_rep2  2>Xu_WT_rep2.masc2.log &

should I do some filter for the alignment files ?

## cat >run_bowtie2.sh
ls *.fastq | while read id ; 
do  
echo $id
~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -p 8 -x ~/biosoft/bowtie/hg19_index/hg19 -U $id   -S ${id%%.*}.sam  2>${id%%.*}.align.log; 
samtools view -bhS -q 30  ${id%%.*}.sam > ${id%%.*}.bam  
samtools sort   ${id%%.*}.bam ${id%%.*}.sort  ## prefix for the output      
samtools index ${id%%.*}.sorted.bam 
done

I don't know whether I should filter out the alignment quality less than 30 , I just do it.

And I don't know whether I should remove PCR duplicated reads or not !

ChIP-Seq MACS • 4.9k views
ADD COMMENT
0
Entering edit mode

Then I change the p value criteria just like :

nohup time ~/.local/bin/macs2 callpeak -c SRR1042594.sam -t SRR1042593.sam -f SAM -p 0.01  -g hs -n Xu_MUT_rep1 2>Xu_MUT_rep1.masc2.log &
nohup time ~/.local/bin/macs2 callpeak -c SRR1042596.sam -t SRR1042595.sam -f SAM -p 0.01  -g hs -n Xu_MUT_rep2 2>Xu_MUT_rep2.masc2.log &
nohup time ~/.local/bin/macs2 callpeak -c SRR1042598.sam -t SRR1042597.sam -f SAM -p 0.01  -g hs -n Xu_WT_rep1  2>Xu_WT_rep1.masc2.log &
nohup time ~/.local/bin/macs2 callpeak -c SRR1042600.sam -t SRR1042599.sam -f SAM -p 0.01  -g hs -n Xu_WT_rep2  2>Xu_WT_rep2.masc2.log &

Then I can get more than 10,000 peaks , I know the criteria is too broad. But I just don't know how to deal it.

ADD REPLY
1
Entering edit mode
ADD COMMENT
0
Entering edit mode

Hi Tangming, I wondered whether you could have a quick look the question I asked. Visualization of ChIP-seq data (mapped reads) Basically, the original paper had unusually sharp peaks, and I could not recapitulate the results. why made me wonder whether they use some special tricks or I had something wrong. Tsk a lot!

ADD REPLY
0
Entering edit mode
7.8 years ago

I guess my previous question was sort of relevant, you might find some useful information here enter link description here My suggestion is "try and see", see enter link description here, "Finding Enriched Regions of Variable Length", the instruction is based on Homer, but I think the general idea is the same for other peak callers: basically you play with different combination of parameters see which one work best for you.

ADD COMMENT
0
Entering edit mode

yeh, I didn't choose HOMER, just becaues it's a little complicate to install it, I will try it as soon as possible.

We have the same problem about how to deal with the alignment files , yes or right ? fiter the bad quality? remove PCR duplicated reads ?

ADD REPLY
0
Entering edit mode

the peak caller does not have to be Homer, what I am trying to say is try different parameters, check the peaks by your self to see the peak calling is proper not by loading the bed file of called peaks in some genome browser such as IGV. start with the default parameter.

For filter, enter link description herehere people recommend to use -q 5 or 10. For "remove PCR duplicated reads ?" Sorry I have no idea.

ADD REPLY
0
Entering edit mode

I would remove the PCR duplicates. A lot of these will map to repetitive elements in the genome for ChIP seq.

ADD REPLY
0
Entering edit mode
7.8 years ago

@Jotan 1982, could we assign the parameter to remove the PCR duplicates, if they will map to receptive elements, does that mean they map to multiple loci? (not uniquely mapped), How to extract unique mapped results from Bowtie2 bam results? here people seems to suggest that by setting -q 5or 10 "samtools view -bhS -q ", reads which not map uniquely will be eliminated.

ADD COMMENT
0
Entering edit mode

I've use the command : samtools view -bhS -q 30 ${id%%.}.sam > ${id%%.}.bam in my code

but I don't know why I should use it ?

ADD REPLY
0
Entering edit mode

you type the command: "samtools view " then you will get something like: Usage: samtools view [options] <in.bam>|<in.sam>|<in.cram> [region ...]

Options: -b output BAM ~ -q INT only include reads with mapping quality >= INT [0] ~

" -q INT only include reads with mapping quality >= INT [0]" if you use "samtools view -bhS -q 30" which means you filter out reads with mapping quality lower than 30, some people suggest that more than 5 or 10 will be good enough.

ADD REPLY
0
Entering edit mode

That's exactly what confuse me .

Some guys in ENCODE project suggest me to only filter out reads with mapping quality lower than 30. While others suggest that more than 5 or 10 should be keep .

maybe it doest matter, I will have a try for both

ADD REPLY

Login before adding your answer.

Traffic: 1852 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6