Question

Annotate Chip-Peaks With Gene Symbol/Name

0

Entering edit mode

12.3 years ago

GeneInfo ▴ 30

I want to use annotatePeaks.pl from HOMER to annotate my ChIP-seq peaks. It works but the column for gene name.etc is missing.

Here is the result:

PeakID (cmd=sample1_uniq_peaks.bed /work/finger3/MotifHOMER/data/genomes/mm9/)    Chr    Start    End    Strand    Peak Score    Focus Ratio/Region Size    Annotation    Detailed Annotation    Distance to TSS    Nearest PromoterID    Entrez ID    Nearest Unigene    Nearest Refseq    Nearest Ensembl    Gene Name    Gene Alias    Gene Description    Gene Type
37    chr1    49627683    49629512    0    0    NA    Intergenic    NA    -1355770    NM_019790

I check the manual of HOMER.

The data I use is mm9 and mm9.tss is included in the directory.

Is there anyway to annotate the peaks with gene symbol/name since my gene expression file is annotated with gene name like p53.

gene • 10k views

ADD COMMENT • link updated 12.3 years ago by Ian 6.1k • written 12.3 years ago by GeneInfo ▴ 30

score 1 · Answer 1 · 2013-04-10

1

Entering edit mode

12.3 years ago

Sukhi Singh 11k

For the sake of simplicity, use Clone/Gene ID convertor.

Take your homer list and pull out the gene column containing NM refseq id's, using sed 1d file.tsv | cut -f10 > nm.tsx (replace 10 in the cut with whatever column you have)

Now, open the text file, copy contents and paste in the box of Clone Id convertor.

Select Mouse and RefSeq_RNA and paste the list. Then, for the output, select the Gene Name and output as txt.

So, now you have the corresponding gene names.

You can paste the geneNames back to the homer list using Linux utility paste

For automated pipelines, I would recommend to download RefSeq gene list from UCSC and make a R script to do that for you using %in%.

Cheers

ADD COMMENT • link 12.3 years ago by Sukhi Singh 11k

0

Entering edit mode

I used to use this tool, but it has not been updated since 2008. It still does a reasonable job, but a more up to date alternative should be used.

ADD REPLY • link 12.3 years ago by Ian 6.1k

0

Entering edit mode

Oh, if thats the case, then its an important point to note, if one is using mm10, this tools will fail to map another 65-100 NM transcripts, generating a discrepancy in the list.

ADD REPLY • link 12.3 years ago by Sukhi Singh 11k

0

Entering edit mode

Thanks a lot for your answer. Yes, I tried to use Clone ID converter. But it is just so slow to run on web interface when you have a long list of genes. Thanks anyway.

ADD REPLY • link 12.3 years ago by GeneInfo ▴ 30

score 0 · Answer 2 · 2013-04-10

You may try the AnnotateGenomicRegions tool that our group has developed recently and which is doing exactly this: you submit a list of genomic regions (chromosome, start, end) and you get in return a list of selected annotations: http://bioserver.iit.ieo.eu/AnnotateGenomicRegions/

The tool is quite new so feedback is welcome.

cheers

score 0 · Answer 3 · 2013-04-11

0

Entering edit mode

12.3 years ago

Ian 6.1k

As people are posting their own solutions here is ours :)

RnaChipIntegrator

We use it to associate the coordinates of differentially expressed genes via RNA-seq with ChIP-seq data, or ChIP-seq data to identify the closest gene(s). Basically be used to compare any genomic feature with genome coordinates to a set of genes. It will also return information about the distance of the features to the closest edge of a gene or its user defined promoter region.

ADD COMMENT • link 12.3 years ago by Ian 6.1k

0

Entering edit mode

Hi, can I use the Rna-seq file without the 'strand' information? (i.e. having only 4 columns). Thanks.

ADD REPLY • link 10.8 years ago by mdp07vm ▴ 30