Question: How much does effective genome size affect the macs2 output?

urjaswita

**70**wrote:This seems to be a simple question, but I couldn't find an answer anywhere. How much does the accuracy of effective genome size affect macs2 output for ChiP-Seq data? For example, if I use total genome size as effective genome size, or total bases - Ns, will it really affect the output a lot? Or just trivially?

Thanks for your insights.

-U

You would overestimate the significance of your peaks I would say. Effective gs for human is 2.7Gb, and total is I think 3.3Gb. Macs uses a poisson distribution to model the peaks, so play around with the formula in the paper, using an imaginary summit readcount as lambda and varying genome sizes, so you should get a feeling for how much the p values are affected. Why is this important btw?

6.5kThanks a lot. Actually I am working on H3K4me3 and H3K27me3 data on some newly sequence animal genomes. I tried to get the effective genome size using GEM, but couldn't run it (I'm still new to this kind of analysis), and the documentation of GEM is not great. I found here How do I compute the effective genome size? that genome size -Ns could be an option.

Could you please advise on a easy to run tool to calculate this stats? Thanks again for your answer!

70Have you read all of the answers to the linked question? They present several good options - have you tried/checked any of them? If your species of interest have their total haploid DNA content determined here, then you can convert that directly to bp to get the effective genome size. If not, you could try to determine that amount experimentally yourself.

Have you tried just removing all Ns from the genome and counting the remaining bases and using that value?

