Rat genome is one of the genomes that are not mentioned very much among macs2 users. So I had to look around and do some work for myself. I want to share the output and also ask a related question.
I used rat genome rn6 downloaded from UCSC bigzip file and kept all contigs that do not have standard chromosome names. The calculation involved using
gem-indexer for base-space data and
gem-mappability with kmer size of 45, 50, 75, 100, and 150 bases. The effective genome size values are the number of '!' characters in the
rn6.softmask.all_45: 2105347242 rn6.softmask.all_50: 2081721273 rn6.softmask.all_75: 2197995070 rn6.softmask.all_100: 2247394146 rn6.softmask.all_150: 2285452802
I also did some test with color-space index of only chromosome 1. The numbers looks greater than that from base-space index of chromosome 1. So here comes some questions.
Which number should I use? The one from color-space index or base-space index. My BAM files were aligned with colorspace aligner and reads are colorspace reads.
What is the consequence of under or over estimate the effective genome size in macs2 output?
Macs2 documentation does not seem to draw much attention about kmer size. What kmer size was used to calculate those values for 'supported' genomes by macs2?