Question: Annotate Chip-Peaks With Gene Symbol/Name
gravatar for GeneInfo
6.1 years ago by
GeneInfo20 wrote:

I want to use from HOMER to annotate my ChIP-seq peaks. It works but the column for gene name.etc is missing.

Here is the result:

PeakID (cmd=sample1_uniq_peaks.bed /work/finger3/MotifHOMER/data/genomes/mm9/)    Chr    Start    End    Strand    Peak Score    Focus Ratio/Region Size    Annotation    Detailed Annotation    Distance to TSS    Nearest PromoterID    Entrez ID    Nearest Unigene    Nearest Refseq    Nearest Ensembl    Gene Name    Gene Alias    Gene Description    Gene Type
37    chr1    49627683    49629512    0    0    NA    Intergenic    NA    -1355770    NM_019790

I check the manual of HOMER.

The data I use is mm9 and mm9.tss is included in the directory.

Is there anyway to annotate the peaks with gene symbol/name since my gene expression file is annotated with gene name like p53.

gene • 6.3k views
ADD COMMENTlink modified 6.1 years ago by Ian5.4k • written 6.1 years ago by GeneInfo20
gravatar for Sukhdeep Singh
6.1 years ago by
Sukhdeep Singh9.7k
Sukhdeep Singh9.7k wrote:

For the sake of simplicity, use Clone/Gene ID convertor.

Take your homer list and pull out the gene column containing NM refseq id's, using sed 1d file.tsv | cut -f10 > nm.tsx (replace 10 in the cut with whatever column you have)

Now, open the text file, copy contents and paste in the box of Clone Id convertor.

Select Mouse and RefSeq_RNA and paste the list. Then, for the output, select the Gene Name and output as txt.

So, now you have the corresponding gene names.

You can paste the geneNames back to the homer list using Linux utility paste

For automated pipelines, I would recommend to download RefSeq gene list from UCSC and make a R script to do that for you using %in%.


ADD COMMENTlink written 6.1 years ago by Sukhdeep Singh9.7k

I used to use this tool, but it has not been updated since 2008. It still does a reasonable job, but a more up to date alternative should be used.

ADD REPLYlink written 6.1 years ago by Ian5.4k

Oh, if thats the case, then its an important point to note, if one is using mm10, this tools will fail to map another 65-100 NM transcripts, generating a discrepancy in the list.

ADD REPLYlink written 6.1 years ago by Sukhdeep Singh9.7k

Thanks a lot for your answer. Yes, I tried to use Clone ID converter. But it is just so slow to run on web interface when you have a long list of genes. Thanks anyway.

ADD REPLYlink written 6.1 years ago by GeneInfo20
gravatar for Arnaud Ceol
6.1 years ago by
Arnaud Ceol840
Milan, Italy
Arnaud Ceol840 wrote:

You may try the AnnotateGenomicRegions tool that our group has developed recently and which is doing exactly this: you submit a list of genomic regions (chromosome, start, end) and you get in return a list of selected annotations:

The tool is quite new so feedback is welcome.


ADD COMMENTlink written 6.1 years ago by Arnaud Ceol840
gravatar for Ian
6.1 years ago by
University of Manchester, UK
Ian5.4k wrote:

As people are posting their own solutions here is ours :)


We use it to associate the coordinates of differentially expressed genes via RNA-seq with ChIP-seq data, or ChIP-seq data to identify the closest gene(s). Basically be used to compare any genomic feature with genome coordinates to a set of genes. It will also return information about the distance of the features to the closest edge of a gene or its user defined promoter region.

ADD COMMENTlink written 6.1 years ago by Ian5.4k

Hi, can I use the Rna-seq file without the 'strand' information? (i.e. having only 4 columns). Thanks.

ADD REPLYlink written 4.7 years ago by mdp07vm30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 520 users visited in the last hour