A perennial question on bioinformatics sites is how to compute the effective genome size for a genome. epic now includes a script called epic-effective to do just this. It can use multiple cores. So the next time someone asks about the effective genome size, you know where to point them.

```
endrebak@havpryd ~/c/epic> epic-effective -h
epic-effective
Compute the effective genome size from a fasta file.
(Visit github.com/endrebak/epic for examples and help.)
Usage:
epic-effective --read-length=LEN [--nb-cpu=CPU] FILE
epic-effective --help
Arguments:
FILE Fasta genome
-r LEN --read-length LEN length of reads
Options:
-h --help show this help message
-n CPU --nb-cpu CPU number of cores to use [default: 1]
endrebak@havpryd ~/c/epic> time epic-effective -r 35 -n 30 ~/genomes/hg19.fa
File analyzed: /local/home/endrebak/genomes/hg19.fa
Genome length: 3095693983
Number unique 35-mers: 2529802735
Effective genome size: 0.8172005207531522
3250.78user 32.13system 4:46.33elapsed 1146%CPU (0avgtext+0avgdata 100815072maxresident)k
6186643inputs+0outputs (0major+162990minor)pagefaults 0swaps
```

To install the epic-package, just use `pip install bioepic`

.

It is uses jellyfish under the hood so you need to install that too. If you use conda `conda install jellyfish`

works on linux64.

I have included how I compute the effective genome size below. If the formula is too simplistic, please tell me.

This is how I compute the EGS:

The effective genome size for a genome G and a read-length L is the number of unique L-mers in G divided by the length of G.

So for reads of length 2 the effective genome size of the geome CCCGNN is the following:

len(["CG"]) / len("CCCGNN")

or 1/6.

**Edit: sorry about the bump. My link was wrong!**