MACS, effective genome size
8.5 years ago

Hello,

Could you please advise how to choose effective genome size as MACS (v.1.4.2) parameter for peak calling on Drosophila Melanogaster reads? dm2 (1.20e+08) or dm3 (1.52e+08)? Sorry if my question seems strange, I'm a newbie in bioinformatics.

8.5 years ago

The answer is whichever of dm2 or dm3 you used for alignment (likely dm3, since dm2 is very old).

Thank you a lot! I haven't done alignment by myself, just got BAM files with input and treatment files (alignment was done with Bowtie2). Is it possible to restore this information? Or maybe bowtie2 uses only dm3?

Just compare the chromosome sizes in the header of the BAM files. Info for dm3 is here, and dm2 is here. It looks like the chromosome names may differ slightly as well.

Alternatively, just ask whomever performed the alignments.

I compared chromosome sizes, it is dm3 as you said. Thank you for your help!

As a remark if anyone stumbles over this: The effective genome size here refers to the part of the genome that is actually uniquely mappable with standard sequencing read length. There are regions in the genome that are repetitive and therefore not mappable with short reads. They would only accumulate multimappers which are typically excluded from analysis due to low reliability (low or 0 MAPQ score). Therefore they are excluded from the genome size calculation which is used for the Poisson statistics macs2 uses. Please check the original paper for details on that.