Question: How can I get .bed file that have unique Entrez gene id (or symbol) for deeptools?
gravatar for swordbreadjj
5.1 years ago by
Korea, Republic Of
swordbreadjj0 wrote:

Hi all, 

I want to get .bed file that contains unique gene id (Entrez id or gene symbol) for function of "computeMatrix reference-point" in deeptools.


like this

chr1    3204562 3661579 NM_001011874    Xkr4    -   
chr1    4481008 4486494 NM_011441   Sox17   -   
chr1    4763278 4775807 NM_001177658    Mrpl15  -   
chr1    4797973 4836816 NM_008866   Lypla1  + 


How or where can i get this file??




chip-seq • 2.9k views
ADD COMMENTlink modified 5.1 years ago by Jorge Amigo12k • written 5.1 years ago by swordbreadjj0
gravatar for Devon Ryan
5.1 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

If you happened to align against a reference genome from UCSC, then you can just download the refseq file from the table browser. This is the simplest method to get a BED12 file for computeMatrix.

If you aren't using a UCSC reference genome (we don't), you can construct the BED12 files as such (this is actually part of my reference genome Makefile):

  1. Download a GTF file from Ensembl/Gencode/UCSC/etc.
  2. I then use the the following, assuming the GTF file is called "annotation.gtf" and the BED file should be called "annotation.bed":
awk '{if ($$3 != "gene") print $$0;}' annotation.gtf | grep -v "^#" | /package/UCSCtools/gtfToGenePred /dev/stdin /dev/stdout | /package/UCSCtools/genePredToBed stdin annotation.bed

Practically speaking, I can also provide appropriate files for most common genomes (mouse, human, fly and worm). Since we developed deepTools, we have a lot of these and I probably just post whatever you need if you can't directly get it from UCSC.

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Devon Ryan97k

is it possible to use gtfToGenePred on gencode version?

ADD REPLYlink written 12 months ago by krushnach80850
gravatar for Jorge Amigo
5.1 years ago by
Jorge Amigo12k
Santiago de Compostela, Spain
Jorge Amigo12k wrote:

you can get that particular output directly from Ensembl at

> select Ensembl genes, homo sapiens (reference dependent)
> go to the attributes section
> unselect ensembl gene id, transcript id
> select chromosome, gene start, gene end, associated transcript name, associated gene name, strand
> export results

note that chromosome names won't have the "chr" suffix. add it if needed.

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Jorge Amigo12k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2174 users visited in the last hour