How can I get .bed file that have unique Entrez gene id (or symbol) for deeptools?
2
0
Entering edit mode
8.5 years ago

Hi all,

I want to get .bed file that contains unique gene id (Entrez id or gene symbol) for function of "computeMatrix reference-point" in deeptools.

like this

chr1    3204562 3661579 NM_001011874    Xkr4    -
chr1    4481008 4486494 NM_011441   Sox17   -
chr1    4763278 4775807 NM_001177658    Mrpl15  -
chr1    4797973 4836816 NM_008866   Lypla1  +

How or where can I get this file?

ChIP-Seq • 5.0k views
ADD COMMENT
1
Entering edit mode
8.5 years ago

If you happened to align against a reference genome from UCSC, then you can just download the refseq file from the table browser. This is the simplest method to get a BED12 file for computeMatrix.

If you aren't using a UCSC reference genome (we don't), you can construct the BED12 files as such (this is actually part of my reference genome Makefile):

  1. Download a GTF file from Ensembl/Gencode/UCSC/etc.
  2. I then use the the following, assuming the GTF file is called "annotation.gtf" and the BED file should be called "annotation.bed":
awk '{if ($$3 != "gene") print $$0;}' annotation.gtf | grep -v "^#" | /package/UCSCtools/gtfToGenePred /dev/stdin /dev/stdout | /package/UCSCtools/genePredToBed stdin annotation.bed

Practically speaking, I can also provide appropriate files for most common genomes (mouse, human, fly and worm). Since we developed deepTools, we have a lot of these and I probably just post whatever you need if you can't directly get it from UCSC.

ADD COMMENT
0
Entering edit mode

is it possible to use gtfToGenePred on gencode version?

ADD REPLY
0
Entering edit mode

The BED file produced here contains transcript ids (eg: ENST....) is it possible to get gene id instead? I tried replacing $$3 != "gene" with $$3 == "gene" but that doesn't work. Is it that BED files have to be with transcript level?

ADD REPLY
0
Entering edit mode
8.5 years ago

you can get that particular output directly from Ensembl at http://www.ensembl.org/biomart/martview/

> select Ensembl genes, homo sapiens (reference dependent)
> go to the attributes section
> unselect ensembl gene id, transcript id
> select chromosome, gene start, gene end, associated transcript name, associated gene name, strand
> export results

note that chromosome names won't have the "chr" suffix. add it if needed.

ADD COMMENT

Login before adding your answer.

Traffic: 1701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6