Question: ChIP-Seq identification of peaks
gravatar for TulsaPrim
12 weeks ago by
TulsaPrim0 wrote:

I have the results of ChIP-Seq experiments for histone modification - aligned reads, and want to develop a script that will identify "true" peaks (I know about MACS, SICER etc. but I need to get familiar with the pre-processing of the data using a custom script). No control data are available to assess the noise.

Which characteristics would form a set of features that will allow for a simple procedure yet yielding good enough results?

I think the ratio: no. reads in the reverse strand / no. reads in the forward strand, and the no. reads within a window should be included.

Thank you.

ADD COMMENTlink modified 11 weeks ago by mforde841000 • written 12 weeks ago by TulsaPrim0

To be quite frank, if you want to develop your own peak caller then you shouldn't be asking these questions. Read the literature. Look at what other tools are already doing, then try to come up with something sensible yourself.

ADD REPLYlink written 12 weeks ago by Devon Ryan73k

I do read the literature but was looking for the extra input of experience. Thanks.

ADD REPLYlink written 12 weeks ago by TulsaPrim0
gravatar for Kevin Blighe
11 weeks ago by
Kevin Blighe6.3k
Republic of Ireland (√Čire)
Kevin Blighe6.3k wrote:

As with most things in bioinformatics, people develop programs to address a specific problem or part of a problem, or to improve on a previous solution based on newfound information/knowledge/research.

You need to ask yourself why you want to develop a new program and who would be the audience, i.e., why would anyone use your program instead of others?

In ChIP-seq, the main parameters that are important are:

  • Quality control of raw reads, including adaptor trimming, eliminating reads below a certain length (e.g. 50bp), and ensuring that each base in each read is > 20 Phred score

  • performing unique mapping to a reference genome and setting the mapping quality (MAPQ) high (e.g. >50 Phred score) to prevent false alignments

  • removing duplicate reads that result from PCR amplification step

  • strand-specific alignments (depends on the NGS lab prep protocol and also the bioinformatic aligner and its parameters used)

  • read depth at each position / number of reads mapping at each position

  • maximum number of reads mapping to each position

  • the depth of coverage 'profile' and where to define a cut-off for signal / no signal, and for peak region identification (e.g. instruct to cease looking further for extending a region if the depth of coverage falls below 10x).

  • window size for moving along the genome and detecting peak regions

  • minimum/maximum peak starting size

  • maximum peak region allowed

  • normalization of depth of coverage and type of normalization used (e.g. Reads Per Kilobase per Million mapped reads [RPKM], reads per genome coverage [RPGC], etc.)

  • P- and Q-values, and fold-changes, for identifying where a region ends

Off the top of my head, current programs include HOMER, SICER, MACS, and bedCoverage.

Good luck! Kevin

ADD COMMENTlink modified 9 weeks ago • written 11 weeks ago by Kevin Blighe6.3k
gravatar for mforde84
11 weeks ago by
mforde841000 wrote:

Here's a pipeline developed by ENCODE for narrow TF peaks. However, you may need to modify it a little to work with broad peaks depending on what histone markers you're looking at.

ADD COMMENTlink written 11 weeks ago by mforde841000
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1012 users visited in the last hour