Question: Annotatepeaks Function From Homer
gravatar for Dataminer
7.8 years ago by
Dataminer2.7k wrote:


I am using function from HOMER to annotate my peaks to the nearest refseq gene. If you have used the tool you will notice it generates 10 columns. I am interested (or the columns which make me interested) are distance to TSS, nearest promoter id, nearest RefSeq ID, and gene name.

Problem: At times the peak is annotated to the nearest promoter id with some distance to TSS, while the column with nearest RefSeq ID and gene name goes blank.

Since I am interested in finding the nearest gene to my peak hence I am interested in nearest Refseq Id and in gene name but if these two columns go blank and distance to nearest TSS does not, then I think I am in bit of trouble (or not)? or is it completely normal that in nearest Promoter Id we can have some id while nearest Refseq Id goes blank and gene name?

I am completely confused.

Thank you please try to answer.

genomics • 7.8k views
ADD COMMENTlink modified 4.7 years ago by bwshi20120 • written 7.8 years ago by Dataminer2.7k

I haven't seen a case like that, can you put an example file with 5-10 lines. There are cases with empty Nearest Ensembl, Gene Name, Gene Alias but a peak is annotated with a distance to TSS, a Nearest PromoterID is specified along with Nearest Refseq, which are in most of the cases same. Another case, is for that PromoterID there is not annotation for that gene in refseq database. Try looking at it in the UCSC genome browser to get a clue.

ADD REPLYlink written 7.8 years ago by Sukhdeep Singh10k

Hi Sukhdeep! Here is an example output file

Chr    St    Stp    TSS    Promoter_ID    Nearest_RefSeq    gene_name
chr1    84743800    84744829    -248 
chr2    120711360   120711771 -14111    NR_000034

In the first example only TSS is present rest of the columns are absent, while in second example TSS and promoter ID are present while rest is absent.

I have used hg18.

ADD REPLYlink modified 7.8 years ago • written 7.8 years ago by Dataminer2.7k
gravatar for Sukhdeep Singh
7.8 years ago by
Sukhdeep Singh10k wrote:

From the manual it says

By default, loads a file in the "/path-to-homer/data/genomes/<genome>/<genome>.tss" that contains the positions of RefSeq transcription start sites. It uses these positions to determine the closest TSS, reporting the distance (negative values mean upstream of the TSS, positive values mean downstream), and various annotation information linked to locus including alternative identifiers (unigene, entrez gene, ensembl, gene symbol etc.). This information is also used to link gene-specific information (see below) to a peak/region, such as gene expression.

This file.tss has the refseq accession number and position on the chromosome. Most of the cases when a peak intersects at any of these co-ordinates, it cross-intersects the refseq id with the gene alias in another file to give to this 10 column file. So, it might be a case when refseq annotation is present but its not it ucsc/ensembl. I dont have the hg18 with me but try to grep NR_000034 in the genome tss file and look up for the same position in browser to get the answer.

In the first case, it subtracted from somewhere but there is no linked annotation and to have the best answer, contact Chris Benner who made the tool.


ADD COMMENTlink written 7.8 years ago by Sukhdeep Singh10k

Sukhdeep: Thank you. I have written to Chris Benner (still waiting for his reply). I did use ucsc genome browser to see if their was a gene (yes, a gene was present). Anyways, I will wait for the reply from chris. Will keep you all updated.

ADD REPLYlink modified 7.8 years ago • written 7.8 years ago by Dataminer2.7k
gravatar for Istvan Albert
7.8 years ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

As Sukhdeep says and I will add it here just because I think it is the right answer, I think the annotation for genes are lacking and it is possible to have a promoter without a correspondingly annotated gene, or one sufficiently close to be included in the report.

ADD COMMENTlink written 7.8 years ago by Istvan Albert ♦♦ 84k
gravatar for bwshi2012
4.7 years ago by
United States
bwshi20120 wrote:

I met the same problem. I also sent Chris Benner a message, but without any reply.

I use mm9, and the code I put in is wt2exp3_peaks.bed data/genomes/mm9 >wt2exp3_peakannotate.xls

and it gives me a result like the following:

Does anyone know how to solve this problem?

ADD COMMENTlink written 4.7 years ago by bwshi20120
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 654 users visited in the last hour