Question

How can I get .bed file that have unique Entrez gene id (or symbol) for deeptools?

0

Entering edit mode

8.5 years ago

swordbreadjj • 0

Hi all,

I want to get .bed file that contains unique gene id (Entrez id or gene symbol) for function of "computeMatrix reference-point" in deeptools.

like this

chr1    3204562 3661579 NM_001011874    Xkr4    -
chr1    4481008 4486494 NM_011441   Sox17   -
chr1    4763278 4775807 NM_001177658    Mrpl15  -
chr1    4797973 4836816 NM_008866   Lypla1  +

How or where can I get this file?

ChIP-Seq • 5.0k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.5 years ago by swordbreadjj • 0

score 1 · Answer 1 · 2015-10-25

1

Entering edit mode

8.5 years ago

Devon Ryan 104k

If you happened to align against a reference genome from UCSC, then you can just download the refseq file from the table browser. This is the simplest method to get a BED12 file for computeMatrix.

If you aren't using a UCSC reference genome (we don't), you can construct the BED12 files as such (this is actually part of my reference genome Makefile):

Download a GTF file from Ensembl/Gencode/UCSC/etc.
I then use the the following, assuming the GTF file is called "annotation.gtf" and the BED file should be called "annotation.bed":

awk '{if ($$3 != "gene") print $$0;}' annotation.gtf | grep -v "^#" | /package/UCSCtools/gtfToGenePred /dev/stdin /dev/stdout | /package/UCSCtools/genePredToBed stdin annotation.bed

Practically speaking, I can also provide appropriate files for most common genomes (mouse, human, fly and worm). Since we developed deepTools, we have a lot of these and I probably just post whatever you need if you can't directly get it from UCSC.

ADD COMMENT • link 8.5 years ago by Devon Ryan 104k

0

Entering edit mode

is it possible to use gtfToGenePred on gencode version?

ADD REPLY • link 4.4 years ago by 1769mkc ★ 1.2k

0

Entering edit mode

The BED file produced here contains transcript ids (eg: ENST....) is it possible to get gene id instead? I tried replacing $$3 != "gene" with $$3 == "gene" but that doesn't work. Is it that BED files have to be with transcript level?

ADD REPLY • link 2.8 years ago by Arindam Ghosh ▴ 510

score 0 · Answer 2 · 2015-10-25

you can get that particular output directly from Ensembl at http://www.ensembl.org/biomart/martview/

> select Ensembl genes, homo sapiens (reference dependent)
> go to the attributes section
> unselect ensembl gene id, transcript id
> select chromosome, gene start, gene end, associated transcript name, associated gene name, strand
> export results

note that chromosome names won't have the "chr" suffix. add it if needed.