Question: macs3 peak calling usage
0
gravatar for arianc
4 days ago by
arianc0
arianc0 wrote:

Hey! I am new in CHIP-seq analysis so please be understanding. I have got CHIP-seq data and I want to do simply regular peak calling. I am typing:

macs3 callpeak -t A.bam -c input.bam -f BAM -g 483151142 -n A

and I have got:

#2 Total number of paired peaks: 0

Anyone know why? And this advise:

we suggest to use --nomodel and --extsize 147 or other fixed number instead.
macs3 callpeak • 112 views
ADD COMMENTlink modified 4 days ago • written 4 days ago by arianc0

In the simplest case you do not have peaks because the ChIP did not properly work. Did you check data on a genome browser such as the IGV to see how it looks by eye?

ADD REPLYlink written 4 days ago by ATpoint44k

I have checked in IGV, it has peaks

ADD REPLYlink written 4 days ago by arianc0

I noticed that it is depend on -g parameter. How to determine the correct one?

ADD REPLYlink written 4 days ago by arianc0
1

It is the effective (=mappable) genome size of your organism. What is your organism?

ADD REPLYlink written 4 days ago by ATpoint44k

Schizosaccharomyces pombe

ADD REPLYlink written 4 days ago by arianc0

It is quite disrespectful to delete threads that received help and then open another identical one. Biostars is a community driven by volunteers, this is no ad-hoc help service. This has also been asked before and the answers there tell you how to determine g.

MACS, effective genome size

Effective genome size of UCSC hg38

As I said already in a comment above, -g is the genome size of your organism, so count the number of nucleotides in the reference file and put that as -g. You used 483151142 above, that yeast has about 14Mb so 14000000, try that. It actually is the part of the genome that is mappable, so not repetitive but unique at the read length you have, but lets not make it too complicated and simply use the length of the genome as in the reference you mapped against.

ADD REPLYlink modified 3 days ago • written 3 days ago by ATpoint44k

I've deleted because I didn't want two with the same question but I will not do that again.

According to your advise, when I type: 1. 14000000 - 14 peaks 2. 12631379 (number of nucleotides in the reference file) - 14 peaks. 3. 140000000 - 671 peaks

MACS3 needs at least 100 paired peaks at + and - strand to build the model. Why there are more peaks if the g parameter is much bigger (wrong)? Besides, in IGV I can see that there are many more peaks than 14.

ADD REPLYlink written 3 days ago by arianc0
1

The larger the genome the smaller the change that reads accumulate randomly at a given location, hence p-values for a true peak would be smaller the larger the -g parameter is, based on my understanding of the macs algorithm.

ADD REPLYlink written 3 days ago by ATpoint44k

You can calculate mappability (effective genome size) by using these directions. You will need to relate it to the length of reads you have.

ADD REPLYlink written 3 days ago by GenoMax94k

I deleted the comments by mistake and I don't know how to restore them

ADD REPLYlink modified 3 days ago • written 3 days ago by arianc0
1

Should be restored now. Once a parent comment is deleted all child comments become invisible. So be careful with that.

ADD REPLYlink written 3 days ago by GenoMax94k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2217 users visited in the last hour
_