Tool:Tool to annotate ChIP-Seq peaks (hg19 or mm10) and find neighboring peaks from multiple experiments.
1
0
Entering edit mode
8.6 years ago

https://github.com/goxed/peak-tool

Wrote this c++ tool for internal use in our lab for some of our custom analysis, thought it might be a good idea to share it with the community.

It's a simple program to annotate human hg19 or mouse mm10 aligned ChIP-Seq peak files. This tool also takes multiple ChIP-Seq peak files from different experiments and finds neighbors of the primary peak file and annotate it.

The tool will parse the gencode annotation database file and list(s) of ChIP-Seq peaks in bed file format. Report detailed promoter / gene-body / intergenic / enhancer occupancy in human or mouse.

Multi peak option reports neighboring peaks in order to elicit co-acting transcription factors. For e.g. With this feature you can correlate your peaks with ENCODE ChIP-Seq data or multiple related ChIP-Seq data-sets.

Compiling:

gunzip gencode.v19.annotation.gtf.gz
gunzip enhancers.bed.gz
make

Running:

20GB RAM required on Linux

Single peak file:

./peak_tool_multi ./test.bed > test.genes.txt

Output:

chr1 1778750 MACS_peak_51 102.12 INTRON GNB1 - GNB1-001 protein_coding 1822495 43745
chr1 1933483 MACS_peak_57 93.87 INTRON C1orf222 - C1orf222-007 retained_intron 1935276 1793
chr1 3446145 MACS_peak_91 85.75 INTRON MEGF6 - MEGF6-001 protein_coding 3448012 1867
chr1 4003155 MACS_peak_104 58.09 ENHANCER . . . . . .
chr1 5787471 MACS_peak_121 1325.16 INTERGENIC . . . . . .
chr1 6473142 MACS_peak_138 988.16 EXON HES2 - HES2-002 protein_coding 6484730 11588
chr1 7259083 MACS_peak_154 60.32 INTRON CAMTA1 + CAMTA1-001 protein_coding 6845384 413699
chr1 8031408 MACS_peak_181 750.45 EXON PARK7 + PARK7-004 protein_coding 8014351 17057
chr1 8319346 MACS_peak_195 3100 ENHANCER . . . . . .

The output is in the following format

chrnum peak_mid peak_name peak_score peak_location gene_name strand isoform isoform_coding_type tx_start_site distance_tss

Multiple peak files:

./peak_tool_multi EXP1.bed EXP2.bed EXP3.bed EXP4.bed > EXP1_EXP2_EXP3_EXP4.genes.txt

Output:

chr1 714304 MACS_peak_2 158.84 INTERGENIC . . . . . . 714017 MACS_peak_1 68.78 -287 . 714039 MACS_peak_1 170.93 -265 . 714023 MACS_peak_1 245.28 -281 .
chr1 769360 MACS_peak_8 421.42 INTERGENIC . . . . . . 769283 MACS_peak_3 55.43 -77 . 769273 MACS_peak_5 154.26 -87 . 769292 MACS_peak_3 71.24 -68 .
chr1 840155 MACS_peak_10 96.66 INTERGENIC . . . . . . 840097 MACS_peak_6 165.5 -58 . 840026 MACS_peak_11 57.72 -129 . 840075 MACS_peak_8 134.82 -80 .
chr1 840738 MACS_peak_11 137.93 ENHANCER . . . . . . 840097 MACS_peak_6 165.5 -641 . 840026 MACS_peak_11 57.72 -712 . 840075 MACS_peak_8 134.82 -663 .
chr1 911680 MACS_peak_15 241.99 PROMOTER C1orf170 - C1orf170-002 retained_intron 912021 341 911740 MACS_peak_20 271.3 60 281 911707 MACS_peak_31 285.65 27 314 911709 MACS_peak_23 251.1 29 312
chr1 994706 MACS_peak_18 95.46 ENHANCER . . . . . . 994613 MACS_peak_35 66.76 -93 . 995290 MACS_peak_52 57.49 584 . 994655 MACS_peak_37 61.84 -51 .
chr1 1003263 MACS_peak_19 71.56 INTERGENIC . . . . . . 1003034 MACS_peak_37 252.84 -229 . 1003088 MACS_peak_54 281.96 -175 . 1003078 MACS_peak_39 643.19 -185 .
chr1 1003982 MACS_peak_20 104.28 ENHANCER . . . . . . 1003034 MACS_peak_37 252.84 -948 . 1003088 MACS_peak_54 281.96 -894 . 1003078 MACS_peak_39 643.19 -904 .
chr1 1098334 MACS_peak_23 149.14 INTERGENIC . . . . . . 1098987 MACS_peak_50 185.58 653 . 1098482 MACS_peak_70 130.48 148 . 1098535 MACS_peak_51 138.52 201 .
ChIP-Seq • 3.9k views
ADD COMMENT
1
Entering edit mode
8.0 years ago
tamars ▴ 10

Hi, Did you have an article on this tool? I would like to receive the source of enhancer list for mm10 and what the different between two of them (enhancers-mm10.bed, super-enhancers-mm10.bed)? Thank,

ADD COMMENT
1
Entering edit mode

I used mouse enhancer data from: http://chromosome.sdsc.edu/mouse/download.html

Super enhancers are from http://bioinfo.au.tsinghua.edu.cn/dbsuper/ses.php?genome=hg19&cell_type=C_001

You may switch the enhancers bed files with your own annotation of enhancers which might be more suitable for your case, and the tool will automatically annotate.

Thank you!

ADD REPLY
0
Entering edit mode

Hi, Abhishek:

I tried to use the peak-tool and got the output correctly. But I noticed all the INTERGENICE and ENHANCER, ENHANCER-SUPER did not have the correlate gene, distance to TSS, but the PROMOTER annotation has all this information, did you have way to add these information? Thanks.

HY

chr1 10162 H3K27Ac_Dox_peak_1 69 INTERGENIC . . . . . . chr1 13995 H3K27Ac_Dox_peak_2 45 INTERGENIC . . . . . . chr1 29571 H3K27Ac_Dox_peak_3 380 INTERGENIC . . . . . . chr1 323229 H3K27Ac_Dox_peak_4 40 INTERGENIC . . . . . . chr1 442492 H3K27Ac_Dox_peak_5 152 INTERGENIC . . . . . . chr1 540745 H3K27Ac_Dox_peak_6 106 ENHANCER . . . . . . chr1 668941 H3K27Ac_Dox_peak_7 67 INTERGENIC . . . . . . chr1 672980 H3K27Ac_Dox_peak_8 78 INTERGENIC . . . . . . chr1 714250 H3K27Ac_Dox_peak_9 229 INTERGENIC . . . . . . chr1 762728 H3K27Ac_Dox_peak_10 251 ENHANCER-SUPER . . . . . . chr1 840330 H3K27Ac_Dox_peak_11 64 ENHANCER-SUPER . . . . . . chr1 894324 H3K27Ac_Dox_peak_12 152 PROMOTER KLHL17 + KLHL17-001 protein_coding 895967 -1643 chr1 894324 H3K27Ac_Dox_peak_12 152 PROMOTER NOC2L - NOC2L-001 protein_coding 894670 346 chr1 896136 H3K27Ac_Dox_peak_13 112 PROMOTER KLHL17 + KLHL17-001 protein_coding 895967 169

ADD REPLY

Login before adding your answer.

Traffic: 1522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6