Question about MACS2 genome size (other organism)
1
0
Entering edit mode
6 weeks ago
chansik ▴ 10

Hi,

I'm using MACS3 to call peak with chip-seq data from Chinese hamster ovary(CHO) cells.

When running the command callpeak,

macs3 callpeak -t sample.bam -c Input.bam -n output -f BAM --outdir ../5.BEDfile/ -g hs -B -q 0.01


what should I type at the -g option? hs stands for homo sapiens. The precompiled genome size in macs are hs: 2.7e9 mm: 1.87e9 ce: 9e7 dm: 1.2e8

Is there no problem with using one of them with CHO cell data?

Thank you,

MACS3 MACS2 • 171 views
0
Entering edit mode
6 weeks ago
ATpoint 55k

How does genome size affect macs2: How much does effective genome size affect the macs2 output?

Unless you find published references for what people commonly use for CHO you can approximate it yourself in a simply/naive way which for this purpose might still be ok: How do I compute the effective genome size? This answer suggests counting the Ns in the reference fasta file (Counting N'S Within Fasta) and subtract that from the total length of the fasta (How to count the length of fasta sequences?).

Basically, it is the number of bases in the genome that are mappable, so non-N and not repetitive, so short reads can be aligned to it. This information is used to build the Poisson statistics which macs2 uses for the pvalue calculation, basically it determines how many reads one expects at a given location of the genome given the total read count. The smaller the genome, the more reads one expects at a spot by chance.