Question: 'effective genome size' , how precisse does it have to be?
0
gravatar for biocyberman
21 months ago by
biocyberman640
Denmark
biocyberman640 wrote:

Rat genome is one of the genomes that are not mentioned very much among macs2 users. So I had to look around and do some work for myself. I want to share the output and also ask a related question.

I used rat genome rn6 downloaded from UCSC bigzip file and kept all contigs that do not have standard chromosome names. The calculation involved using gem-indexer for base-space data and gem-mappability with kmer size of 45, 50, 75, 100, and 150 bases. The effective genome size values are the number of '!' characters in the .mappability files.

rn6.softmask.all_45: 2105347242
rn6.softmask.all_50: 2081721273
rn6.softmask.all_75: 2197995070
rn6.softmask.all_100: 2247394146
rn6.softmask.all_150: 2285452802

I also did some test with color-space index of only chromosome 1. The numbers looks greater than that from base-space index of chromosome 1. So here comes some questions.

  1. Which number should I use? The one from color-space index or base-space index. My BAM files were aligned with colorspace aligner and reads are colorspace reads.

  2. What is the consequence of under or over estimate the effective genome size in macs2 output?

  3. Macs2 documentation does not seem to draw much attention about kmer size. What kmer size was used to calculate those values for 'supported' genomes by macs2?

Thanks.

macs2 • 959 views
ADD COMMENTlink modified 21 months ago by Istvan Albert ♦♦ 75k • written 21 months ago by biocyberman640
2
gravatar for Istvan Albert
21 months ago by
Istvan Albert ♦♦ 75k
University Park, USA
Istvan Albert ♦♦ 75k wrote:

I think the overall inaccuracy and errors of MACS2 are much (much) larger than what the uncertainties of the mappability might introduce. IMHO as long as your mappability is correct in the first digit (order of magnitude) you are fine.

ADD COMMENTlink written 21 months ago by Istvan Albert ♦♦ 75k

Agreed - you have to change the number a lot to see any great change in the results. Best thing to reassure yourself is to try changing it for yourself. I usually go for a figure of 75% of the total genome; moving up or down depending. for example, known repetitive sequence content.

ADD REPLYlink modified 21 months ago • written 21 months ago by Ian5.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1028 users visited in the last hour