Question: 'effective genome size' , how precisse does it have to be?
gravatar for biocyberman
3.4 years ago by
biocyberman770 wrote:

Rat genome is one of the genomes that are not mentioned very much among macs2 users. So I had to look around and do some work for myself. I want to share the output and also ask a related question.

I used rat genome rn6 downloaded from UCSC bigzip file and kept all contigs that do not have standard chromosome names. The calculation involved using gem-indexer for base-space data and gem-mappability with kmer size of 45, 50, 75, 100, and 150 bases. The effective genome size values are the number of '!' characters in the .mappability files.

rn6.softmask.all_45: 2105347242
rn6.softmask.all_50: 2081721273
rn6.softmask.all_75: 2197995070
rn6.softmask.all_100: 2247394146
rn6.softmask.all_150: 2285452802

I also did some test with color-space index of only chromosome 1. The numbers looks greater than that from base-space index of chromosome 1. So here comes some questions.

  1. Which number should I use? The one from color-space index or base-space index. My BAM files were aligned with colorspace aligner and reads are colorspace reads.

  2. What is the consequence of under or over estimate the effective genome size in macs2 output?

  3. Macs2 documentation does not seem to draw much attention about kmer size. What kmer size was used to calculate those values for 'supported' genomes by macs2?


macs2 • 1.8k views
ADD COMMENTlink modified 3.4 years ago by Istvan Albert ♦♦ 80k • written 3.4 years ago by biocyberman770
gravatar for Istvan Albert
3.4 years ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

I think the overall inaccuracy and errors of MACS2 are much (much) larger than what the uncertainties of the mappability might introduce. IMHO as long as your mappability is correct in the first digit (order of magnitude) you are fine.

ADD COMMENTlink written 3.4 years ago by Istvan Albert ♦♦ 80k

Agreed - you have to change the number a lot to see any great change in the results. Best thing to reassure yourself is to try changing it for yourself. I usually go for a figure of 75% of the total genome; moving up or down depending. for example, known repetitive sequence content.

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by Ian5.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1577 users visited in the last hour