Question: macs2 "Effective genome size" for repetitive genomes
2.9 years ago by
European Union
Menachem Sklarz10 wrote:

Hi everyone

I'm working on a chip-seq experiment in Wheat, which has a very large and repeptitive genome.

I'm a bit baffled by the "effective genome size" parameter in macs2. I understand it is related to the repetitiveness of the genome but I'm not sure how to calculate it. I've tried GEM but it gave me an error, so in parallel to trying to solve the GEM problem, maybe someone has an alternative?

Secondly, if I'm looking for peaks in repeptitive as well as non-repetitve regions of the genome, I thought maybe I should use the full length rather than the mappable length. Am I correct?

Finally - if I have a control sample (no antibody), can that be used to estimate the mappability of the genome?


2.9 years ago by
geek_y8.7k wrote:

Effective genome size is after removing the repetetive elements in the genome. So you need to get the uniqely mappable regions.

Though directly not related, few options are given here

Of which GEM-Mappability Calculator would be useful for you.

You may not be able to use the control sample for calculating the mappability as it would not cover the entire genome.

Thanks for the input and for the links. 

Do you think the "effective genome size" should be calculated the same way I'm doing the mapping? For example, if I'm retaining only uniquely mapped reads, then I should calculate the mappability as uniquely mappable regions but if I'm retaining also reads that mapped twice or three times then maybe I should determine the mappability as the regions that are mappable two- or three times?

Is there a site you know of that explains the ststistics behind the effective size?


There is no complicated statistics behind "effective genome size".  This Why Does Macs Use A Genome Size Of 2.7 Billion Instead Of 3 Billion For Human? might be useful and may be this paper.

Thanks a lot for the links! 

