Question: HOMER annotatePeaks.pl problem
1
gravatar for Seq225
22 months ago by
Seq22590
Seq22590 wrote:

I am using HOMER to get read mapping around TE (well, I need to make a heatmap). I am using following command:

annotatePeaks.pl  XXX.bed  XXX(genome)   -bedGraph XXX.bedgraph -size 200 -hist 2 -ghist  -noadj -fragLength 0   > XXX.txt

Before running this command, I converted my bed file using changeNewLine.pl and configured my genome in HOMER. I have been using this tool and so this command for couple of weeks now. From terminal result it seems ok. Here is the result in terminal:

Peak file = XXX.bed
Genome = XXX
Organism = null
bedGraph Files:
XXX.bedgraph
Peak Region set to 200
-----------------------------------------------------
Histogram mode activated (bin size = 2 bp)
-----------------------------------------------------
Will create histogram for each gene
Will NOT normalize tag counts
Fragment Length set to 0
Peak/BED file conversion summary:
BED/Header formatted lines: 4203
peakfile formatted lines: 0
Duplicated Peak IDs: 0

Peak File Statistics:
Total Peaks: 4203
Redundant Peak IDs: 0
Peaks lacking information: 0 (need at least 5 columns per peak)
Peaks with misformatted coordinates: 0 (should be integer)
Peaks with misformatted strand: 0 (should be either +/- or 0/1)

Peak file looks good!

Resizing peaks...
Reading Positions...
-----------------------
Compiling per bp Histograms...
Finding Tags in Peaks from each directory...

Order of experiments in output file:
XXX.bedgraph

But, the number of rows in my output file and number in input bed file are not same. Number in output file is much less. Would you please tell me what went wrong and how can I get all rows in my output file?

Thank you very much and I am very sorry to bother.

rna-seq chip-seq homer • 3.1k views
ADD COMMENTlink modified 21 months ago • written 22 months ago by Seq22590

ChIPseeker R package is a good to option to analyze chipseq

ADD REPLYlink written 22 months ago by tarek.mohamed250

I have done it!! I used deeptools in galaxy and options were quite simple there.

Thank you very very much for suggesting deepTools!!

ADD REPLYlink written 21 months ago by Seq22590

Great news! I also eventually migrated from using HOMER to deepTools.

Best of luck.

ADD REPLYlink written 21 months ago by Kevin Blighe45k
5
gravatar for Kevin Blighe
22 months ago by
Kevin Blighe45k
Kevin Blighe45k wrote:

If you just want to get gene annotation for your BED regions, you only need to use annotatePeaks like this:

perl annotatePeaks.pl MyBED.bed mm9 -annStats AnnotationStats.txt > Annotation.tsv

This always produces an equal number of annotated regions as per my BED file. The correct genome has to be used of course.

HOMER will assume that the BED file regions define your peaks. I don't believe that you have t supply all of the other information such as the bedGraph, peak size, etc.

ADD COMMENTlink written 22 months ago by Kevin Blighe45k

Thanks. But I want to make an heatmap. That's the actual purpose. For heatmap -hist and -ghist are necessary. -size 200 is for defining the regions that I did data on.....

I am not sure how the genome file is working here. The bed file should work on the bedgraph file (which is the actual mapped file, which I created by bowtie2, samtools, and bedtools genomecov). bed file has the coordinates around which (defined by -size 200) I need the mapping numbers (probably defined by tags in the main homer manual).

It is highly likely though that I have misunderstood some options completely!!

ADD REPLYlink written 22 months ago by Seq22590
1

Ah, I see. I also used annotatePeaks when I was generating a heatmap, however, I used cluster3 to build the actual heamap based on annotatePeak's output. Here were my commands:

perl annotatePeaks.pl FinalPeaks.txt mm9 -size 2000000 -hist 5000 -norm 1000000 -ghist -d MyEmptyVectorTagDirectory/ MyPositiveControlTagDirectory/ MyTargetMarkerTagDirectory/ > ForHeamap.Blueprint.txt
cluster3 -f ForHeatmap.Blueprint.txt -g 2 -m c
cluster3 -f ForHeatmap.Blueprint.txt -k 10

I believe that FinalPeaks.txt is just the standard peaks file produced by HOMER, but cannot recall exactly.

Also, there was another recent question specifically about visualisation of ChIP-seq data, and in my answer I mentioned how heatmaps can also be done with deepTools: A: Visualization for ChIP-seq analysis

PS - my marker of interest covered very large regions, but your marker is obviously binding to more narrow regions

ADD REPLYlink modified 22 months ago • written 22 months ago by Kevin Blighe45k
1

I will try deepTools for sure.

However, I am also using cluster3. I am using the homer output file annotatePeaks.pl XXX.bed XXX(genome) -bedGraph XXX.bedgraph -size 200 -hist 2 -ghist -noadj -fragLength 0 > XXX.txt) XXX.txt in cluster3 to cluster them, which produce a .cdt file. I use the .cdt file for making heatmap in Java Treeview.

Cluster3 and Treeview are working fine. My problem is the XXX.txt file that homer gives me. It does not have all the entries of the bed file.

I will try your command to use homer and cluster3 together.

ADD REPLYlink written 22 months ago by Seq22590

Yes, we are using the function slightly differently: I just supply a single peaks file that contains my regions of interest (with first 3 columns in BED format), and then also a few tags directories for my samples. The function then generates the histogram information over my peaks using the tags information for each sample.

Please let me know if you have any success! I do recall having to work through trial and error generally with HOMER's functions. However, the images of Homer Simpson and the doughnut on the website made it easier.

Edit: the first thing that I suggest you change is -fragLength 0 to -fragLength auto, but perhaps you have already tried this

ADD REPLYlink modified 22 months ago • written 22 months ago by Kevin Blighe45k

Also, as you've supplied a bedGraph file and not a tag directory, I believe that HOMER will look for 'tags' (reads) in your bedGraph file for the regions defined in your input BED. If there are no tags, or the regions in both the BED and bedGraph don't overlap, then by all means some regions won't appear in the final output.

ADD REPLYlink modified 22 months ago • written 22 months ago by Kevin Blighe45k

change -fragLength 0 to -fragLength auto does not make any difference. Your point about bedgraph and bed overlap is what I thought initially. But I want all rows. Moreover, I checked the output file, there are actually some rows that do not have any tags........I don't know what is actually happening.

ADD REPLYlink written 22 months ago by Seq22590

Okay, apologies that they did not work.

Did you try using the tag directories like I did (instead of supplying the bedGraph)? I just checked my data and I have the same number of regions/rows in my input file as I do in the file then going into cluster3/Treeview

ADD REPLYlink written 22 months ago by Kevin Blighe45k
1

Please don't apologize, I guess you are directing me to the right pipeline. I am working on the tagdirectory option. However, I am getting different results, and also the number of rows is still different.

Here is what I used: makeTagDirectory tagdir -keepAll -single -flip a.bam

I need to figure out the right command for making the tag directory. My data is actually small-RNA seq data produced in Illumina platform in un-paired fashion. I am still reading the manual to get the right command!!

Thanks a lot for your input!!!!

ADD REPLYlink written 22 months ago by Seq22590
1

Okay, here's one final throw of the dice! Here are all of my commands that led up to the cluster3 step. I was using the mm9 genome, and in this example I allude to just 2 sample tag directories: MySampleTags/ and MyInputControlTags/

#Alignment
bowtie2 -p 7 -x mm9 -U MySample.fastq.gz > MySample.sam

#Remove PCR duplicates
samtools view -bS MySample.sam > MySample.bam
samtools sort MySample.bam MySample_Sorted
java -jar picard.jar MarkDuplicates INPUT=MySample_Sorted.bam OUTPUT=MySample_Sorted_PCRDuped.bam ASSUME_SORTED=true METRICS_FILE=MySample_Sorted_PCRDuped.txt VALIDATION_STRINGENCY=SILENT
samtools view -b -F 0x400 MySample_Sorted_PCRDuped.bam > MySample_Sorted_PCRDupesRemoved.bam
samtools index MySample_Sorted_PCRDupesRemoved.bam
samtools view -h MySample_Sorted_PCRDupesRemoved.bam > MySample_Sorted_PCRDupesRemoved.sam

#Make tag directory
makeTagDirectory MySampleTags/ MySample_Sorted_PCRDupesRemoved.sam -unique -mapq 30 -genome mm9 -checkGC -normGC default -precision 3

#Remove out of bounds reads (optional)
perl removeOutOfBoundsReads.pl MySampleTags/ mm9

#Identify peaks
findPeaks MySampleTags/ -i MyInputControlTags/ -gsize 3500000000 -style histone -size 1500 -minDist 15000 -region -nfr -fdr 0.00001 -F 2 -P 0.0001 -o MySampleTags/MySample_FinalPeaks.txt

#Annotate the peaks
perl annotatePeaks.pl MySampleTags/MySample_FinalPeaks.txt mm9 -annStats MySampleTags/MySample_FinalPeaks_AnnotStats.txt > MySampleTags/MySample_FinalPeaks_Annot.txt

#Plotting tag densities around defined peaks
perl annotatePeaks.pl MySampleTags/MySample_FinalPeaks.txt mm9 -size 15000 -hist 100 -norm 1000000 -d MySampleTags/ MyInputControlTags/ > PeaksDensities.txt

#Plot a 'blueprint' of the data, then cluster using Cluster 3.0 (sudo apt-get install cluster3) with Spearman rank correlation values and centroid linkage, and finally view in TreeView
perl annotatePeaks.pl MySampleTags/MySample_FinalPeaks.txt mm9 -size 2000000 -hist 5000 -norm 1000000 -ghist -d MySampleTags/ MyInputControlTags/ > HeatmapBlueprint.txt
cluster3 -f HeatmapBlueprint.txt -g 2 -m c
cluster3 -f HeatmapBlueprint.txt -k 10
ADD REPLYlink written 22 months ago by Kevin Blighe45k

Thanks a ton!

I will try your commands. I am also working on DeepTools. Hopefully, either one will work. Cheers!!

ADD REPLYlink written 22 months ago by Seq22590
1

No problem. I really like Homer but I also found it difficult to follow and piece together a working pipeline. I spent many long days skipping lunch with my head buried into my laptop.

Good luck!

ADD REPLYlink written 22 months ago by Kevin Blighe45k

@Kevin how do i see the occupancy of a gene with its target genes from this homer analysis ,idea is to see the occupancy of a gene to its target genes upstream and downstream of the TSS like 500 or 1000 bp

can you give me some direction ? how do i proceed after doing the peak annotation

ADD REPLYlink written 3 months ago by krushnach80530
1

Hey krushnach, what was your ChIP-seq target? - a transcription factor? If you have your peaksalready identified, then you could literally just add 500 or 1000 to the start and end co-ordinates and then annotate that.

ADD REPLYlink written 3 months ago by Kevin Blighe45k

okay may be i can drop you a mail ...with the info the experiment design

ADD REPLYlink written 3 months ago by krushnach80530
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1512 users visited in the last hour