Effective Genome Size In Macs [Hg19]
2
0
8.9 years ago
Dataminer ★ 2.7k

Hi!

I have been using MACS algorithm to call peaks from BAM & BED files, by default it uses effective genome size of hg18 (UCSC) which is 2.7 billion (2.7e9) but if my data is mapped to hg19 then this effective genome size needs to be modified.

Could anyone of you please tell me what will be the effective genome size of hg19 for MACS2 or/and MACS14?

Thank you

macs genomics • 8.3k views
1
8.9 years ago

Answer to your question is on Macs group page Effective genome size of hg19 assembly. I was also, looking for Effective Genome Size Of Mm10 For Macs14 and settled with an answer, from @Matted

Any reason you don't want to use the mm9 value? I can't imagine it's that different, and if their method is that sensitive to this parameter, you probably won't be too happy anyways...

Same being said in the post above at Macs group page.

0
Hi Sukhdeep!

No offences, I saw the google group post. The effective genome size of hg18 is 2.7 e9 while if you see the figures given abk and multiple by 2 and take the sum (as written) you get 1.7e11, is this correct?

On the otherhand if I use the utility fetchChromSizes from UCSC for hg18 and hg19 I get following results

./fetchchrom.sh hg18 2>/dev/null | perl -lanE 'BEGIN { say my $sum = 0; }$sum += int $F[1]; END { say "Total:$sum" }'
0
Total: 3107677273

./fetchchrom.sh hg19 2>/dev/null | perl -lanE 'BEGIN { say my $sum = 0; }$sum += int $F[1]; END { say "Total:$sum" }'
0
Total: 3137161264

1
No, It depends on the readLength of your raw data which you mapped. And we don't go for both strands, I cannot account for a reason right now, but you should just work with single strand and it has nothing to do with single/paired end data. So, pick one according to your read length.

For the fetching, the numbers you have are the complete genome size, but its different than the Macs post, which has the mappable genome size. See this

It's the mappable genome size or effective genome size which is defined as the genome size which can be sequenced. Because of the repetitive features on the chromsomes, the actual mappable genome size will be smaller than the original size, about 90% or 70% of the genome size.

0
Ok, so I can use the value 8.6e10? right

1
No man, where did you see this value, see this for male hg19, the numbers according to him are

ReadLength      #unique_mappable_bases_+_strand
20  2051362468
21  2141729364
22  2195156956
23  2233137559
24  2264054979
25  2291133383
26  2315721197
27  2338446356
28  2359621287
29  2379465912
30  2398153029
31  2415817729
32  2432570960
33  2448496460
34  2463648439
35  2478079917
36  2491830662
37  2504937376
38  2517438708
39  2529367465
40  2540757438
41  2551634696
42  2562023283
43  2571943680
44  2581425092
45  2590491397
46  2599161532
47  2607449229
48  2615371906
49  2622937438
50  2630157453
51  2637047521
52  2643619066
53  2649893267
54  2655882772

0
I have seen it here https://groups.google.com/forum/?fromgroups=#!topic/macs-announcement/4UCOxH8s0Cg look for answer from "abk" his second answer. I am looking for hg19 and not for mm9 :)

0
0
Hi Sukhi, Your answer says, "Male Mouse" ;) , ooh so I can use 2.7e9 (default value) and it won't make any difference(not big).

1
aah aah big typo, I thought were pointing in the link, yeah it wont make a big difference, go ahead. :)

0
Thank's Sukhi, Have a nice weekend :)

0
You too man :-)

0
Does someone care to tell , why my answer is downvoted?? , because of redirection or what!!

0
That does look like a strange downvote pattern. Perhaps Istvan can look into it.

0
I'll investigate could be some sort of bug - I can't see the reason for downvotes

0
It is a bug here I can confirm that - https://github.com/ialbert/biostar-central/issues/175

0
Thanks, Neil and Istvan for looking into it, but did you get the cause, was it auto downvoting or there was some other trigger and the future course, the points will be returned or I have to delete answer.

0
let's not discuss this here as it will notify everyone else on thread - let's use the main issue tracker

0
3.1 years ago
solo7773 ▴ 80

I encountered the same problem recently. Though the answer and reply above have given the answer implicitly, I am trying to post a concise and specific number which (in my opinion) is actually available (since 2011) at

http://genomewiki.ucsc.edu/index.php/Genome_size_statistics

http://genomewiki.ucsc.edu/index.php/Hg19_Genome_size_statistics

name | total size | non N-base size

hg18 | 3,107,677,273 | 2,881,421,696

hg19 | 3,137,161,264 | 2,897,310,462

As the effective size of hg18 in MACS2 is 2.7e9 (90% of 3.1e9), the effective size of hg19 can also be 2.7e9.