Macs: How Low Can One Go For Mfold Parameter; And What Does Uneven Treatment & Control Tags Mean?
1
2
Entering edit mode
10.9 years ago
Jordan ★ 1.3k

Hi,

I'm working on chip-seq data and I have a couple of questions on MACS.

  1. How much low can you go for mfold parameter. The default is 10,30. I have given 5,30 and still the model could only call 392 peaks. I wanted to know if it's alright to go below 5?
  2. I get a warning that, Treatment and control tags are uneven! FDR may be wrong. How can I fix this warning? Or is it alright to ignore this warning? i was wondering if it impedes my analysis by any chance?

Thanks in advance!

macs chip-seq • 11k views
ADD COMMENT
0
Entering edit mode

One possible way to solve the problem of "Treatment and control tags are uneven! FDR may be wrong" that for me worked pretty fine is:

1.- Try to get the read length of both control and sample to be the same 2.- Down sample the sample that have more reads to the sample that have less reads

I know that MACS scales the samples in order to get the peaks, but the solution that I state is based on try-catch-error

Hope it helps!

ADD REPLY
7
Entering edit mode
10.9 years ago
KCC ★ 4.1k

How many tags do you have for treatment and how many do you have for control? Sounds like there is a just a big difference in the amount of data you have for each.

I think it's okay to use mfold with a lower value, for instance 3,30. Although it depends. Let me explain. The mfold parameter is used to build the shift model. The reason the shift model is important is it determines how much you have to shift your tags on the forward strand and the reverse strand. The theory is that when a transcription factor is bound at a particular spot, it causes a lag in opposite directions on both strands, because fragments tend to break at the point where the fragment is bound. Once macs figures out how much to shift things it will shift tags forward on the forward strand and backward on the reverse strand.

To build this shift model, you want 'real' peaks. The mfold parameter is actually defining the definition of a peak. That way your model will be accurate. So, -m 10,30 means that peaks that are about 10 fold to 30 fold enriched are going to be used as real peaks. The default values are just a guess (probably based on trial and error) and right values for your data could be different.

Anyway, in theory you could figure out what the shift in your data is yourself. There are ways to tackle this problem. (You can judge the level of fit of the current shift model by running the R script that macs produces.) Once you know the right shift size for your data, the whole issue of what mfold to use is irrelevant. You should just input your desired shift-size yourself and use the '--nomodel' option. I don't think macs does a great job of figuring out the shift-size anyway.

ADD COMMENT
0
Entering edit mode

I think I have around 5 million tags (by tags u mean the number of reads generated in the bam file right?) in control and 1 million in treatment. Is it bad to have such a huge difference in numbers?

And I'm not quite clear on the shift model. Why are we shifting the tags on forward and reverse strands? Is it just moving the window upstream and downstream of a tag position?

You said when transcription factor is bound a spot, it causes a lag in opp directions on both strands? Can you explain this a bit more? I don't understand that. Sorry, I'm still a novice in Chip-seq!

ADD REPLY
3
Entering edit mode

I think this concept of why we shift is best explained visually. Please look at this paper "Evaluation of Algorithm Performance in ChIP-Seq Peak Detection ", figure 1A. The left half of the diagram.

ADD REPLY

Login before adding your answer.

Traffic: 3020 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6