Question: ChIP-Seq identification of peaks
gravatar for TulsaPrim
9 months ago by
TulsaPrim0 wrote:

I have the results of ChIP-Seq experiments for histone modification - aligned reads, and want to develop a script that will identify "true" peaks (I know about MACS, SICER etc. but I need to get familiar with the pre-processing of the data using a custom script). No control data are available to assess the noise.

Which characteristics would form a set of features that will allow for a simple procedure yet yielding good enough results?

I think the ratio: no. reads in the reverse strand / no. reads in the forward strand, and the no. reads within a window should be included.

Thank you.

ADD COMMENTlink modified 8 months ago by mforde841.1k • written 9 months ago by TulsaPrim0

To be quite frank, if you want to develop your own peak caller then you shouldn't be asking these questions. Read the literature. Look at what other tools are already doing, then try to come up with something sensible yourself.

ADD REPLYlink written 9 months ago by Devon Ryan79k

I do read the literature but was looking for the extra input of experience. Thanks.

ADD REPLYlink written 9 months ago by TulsaPrim0
gravatar for Kevin Blighe
8 months ago by
Kevin Blighe19k
University College London Cancer Institute
Kevin Blighe19k wrote:

As with most things in bioinformatics, people develop programs to address a specific problem or part of a problem, or to improve on a previous solution based on newfound information/knowledge/research.

You need to ask yourself why you want to develop a new program and who would be the audience, i.e., why would anyone use your program instead of others?

In ChIP-seq, the main parameters that are important are:

  • Quality control of raw reads, including adaptor trimming, eliminating reads below a certain length (e.g. 50bp), and ensuring that each base in each read is > 20 Phred score

  • performing unique mapping to a reference genome and setting the mapping quality (MAPQ) high (e.g. >50 Phred score) to prevent false alignments

  • removing duplicate reads that result from PCR amplification step

  • strand-specific alignments (depends on the NGS lab prep protocol and also the bioinformatic aligner and its parameters used)

  • read depth at each position / number of reads mapping at each position

  • maximum number of reads mapping to each position

  • the depth of coverage 'profile' and where to define a cut-off for signal / no signal, and for peak region identification (e.g. instruct to cease looking further for extending a region if the depth of coverage falls below 10x).

  • window size for moving along the genome and detecting peak regions

  • minimum/maximum peak starting size

  • maximum peak region allowed

  • normalization of depth of coverage and type of normalization used (e.g. Reads Per Kilobase per Million mapped reads [RPKM], reads per genome coverage [RPGC], etc.)

  • P- and Q-values, and fold-changes, for identifying where a region ends

Off the top of my head, current programs include HOMER, SICER, MACS, and bedCoverage.

Good luck! Kevin

ADD COMMENTlink modified 8 months ago • written 8 months ago by Kevin Blighe19k

Hi Kevin, thanks for the answer. Just wanna ask if MACS2 can perform reads per genome coverage [RPGC] normalization?

ADD REPLYlink written 4 months ago by chiefcat70
gravatar for mforde84
8 months ago by
mforde841.1k wrote:

Here's a pipeline developed by ENCODE for narrow TF peaks. However, you may need to modify it a little to work with broad peaks depending on what histone markers you're looking at.

ADD COMMENTlink written 8 months ago by mforde841.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 983 users visited in the last hour