Question

macs2 "Effective genome size" for repetitive genomes

0

Entering edit mode

9.6 years ago

Menachem Sklarz ▴ 10

Hi everyone

I'm working on a chip-seq experiment in Wheat, which has a very large and repeptitive genome.

I'm a bit baffled by the "effective genome size" parameter in macs2. I understand it is related to the repetitiveness of the genome but I'm not sure how to calculate it. I've tried GEM but it gave me an error, so in parallel to trying to solve the GEM problem, maybe someone has an alternative?

Secondly, if I'm looking for peaks in repeptitive as well as non-repetitve regions of the genome, I thought maybe I should use the full length rather than the mappable length. Am I correct?

Finally - if I have a control sample (no antibody), can that be used to estimate the mappability of the genome?

Thanks!

ChIP-Seq • 5.0k views

ADD COMMENT • link updated 9.6 years ago by GouthamAtla 12k • written 9.6 years ago by Menachem Sklarz ▴ 10

Ram · Answer 1 · 2015-11-17

1

Entering edit mode

9.6 years ago

GouthamAtla 12k

Effective genome size is after removing the repetetive elements in the genome. So you need to get the uniqely mappable regions.

Though directly not related, few options are given here https://github.com/fidelram/deepTools/wiki/General-deepTools-FAQs#effGenomeSize

Of which GEM-Mappability Calculator would be useful for you.

You may not be able to use the control sample for calculating the mappability as it would not cover the entire genome.

ADD COMMENT • link 9.6 years ago by GouthamAtla 12k

0

Entering edit mode

Thanks for the input and for the links.

Do you think the "effective genome size" should be calculated the same way I'm doing the mapping? For example, if I'm retaining only uniquely mapped reads, then I should calculate the mappability as uniquely mappable regions but if I'm retaining also reads that mapped twice or three times then maybe I should determine the mappability as the regions that are mappable two- or three times?

Is there a site you know of that explains the statistics behind the effective size?

Thanks

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 9.6 years ago by Menachem Sklarz ▴ 10

0

Entering edit mode

There is no complicated statistics behind "effective genome size". This Why Does Macs Use A Genome Size Of 2.7 Billion Instead Of 3 Billion For Human? might be useful and may be this paper. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0030377

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 9.6 years ago by GouthamAtla 12k

0

Entering edit mode

Thanks a lot for the links!

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 9.6 years ago by Menachem Sklarz ▴ 10