Question: How to get promoter coordinates of hg19 from UCSC genome browser ?
1
gravatar for jack
4.1 years ago by
jack790
Germany
jack790 wrote:

Hi all,

I need to get Promoter coordinates of all genes in human genome from hg19 assembly.

Is it possible to get it from UCSC table ? I tried, but I was not successful.

Would someone can help me with that ?

ADD COMMENTlink modified 4.1 years ago by Giovanni M Dall'Olio26k • written 4.1 years ago by jack790
9
gravatar for Giovanni M Dall'Olio
4.1 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

This should be a simple question, but in reality there are many approaches because there are multiple definitions of promoters.

The simplest way to do it is to go the the HG19 folder of the UCSC FTP site and download the upstream1000.fa.gz file, containing the sequence of the promoters for all the human genes.

If you are familiar with R you can do it using the brand new AnnotationHub interface from BioConductor. For more information, follow the tutorial here. In particular this code is based on this video.

> source("http://bioconductor.org/biocLite.R")
> biocLite("GenomicRanges")
> biocLite("AnnotationHub")
> biocLite("rtracklayer")
> library("GenomicRanges")
> library("AnnotationHub")
>
> qhs = query(ahub, c("RefSeq", "Homo sapiens", "hg19"))
> genes = qhs[[1]]
> proms = promoters(genes)

UCSC track 'refGene'
UCSCData object with 50066 ranges and 5 metadata columns:
                       seqnames               ranges strand   |         name     score     itemRgb                thick
                          <Rle>            <IRanges>  <Rle>   |  <character> <numeric> <character>            <IRanges>
      [1]                  chr1 [66997825, 67000024]      +   |    NM_032291         0        <NA> [67000042, 67208778]
      [2]                  chr1 [ 8376145,  8378344]      +   | NM_001080397         0        <NA> [ 8378169,  8404073]
      [3]                  chr1 [50489427, 50491626]      -   |    NM_032785         0        <NA> [48999845, 50489468]
      [4]                  chr1 [16765167, 16767366]      +   | NM_001145277         0        <NA> [16767257, 16785491]
      [5]                  chr1 [16765167, 16767366]      +   | NM_001145278         0        <NA> [16767257, 16785385]
      ...                   ...                  ...    ... ...          ...       ...         ...                  ...
  [50062] chr19_gl000209_random     [ 55209,  57408]      +   |    NM_002255         0        <NA>     [ 57249,  67717]
  [50063] chr19_gl000209_random     [ 44646,  46845]      +   | NM_001258383         0        <NA>     [ 57132,  67717]
  [50064] chr19_gl000209_random     [ 96135,  98334]      +   |    NM_012313         0        <NA>     [ 98146, 112480]
  [50065] chr19_gl000209_random     [ 68071,  70270]      +   | NM_001083539         0        <NA>     [ 70108,  83979]
  [50066] chr19_gl000209_random     [129433, 131632]      +   |    NM_012312         0        <NA>     [131468, 145120]
                                                    blocks
                                             <IRangesList>
      [1] [    1,   227] [91706, 91769] [98929, 98953] ...
      [2]       [   1,  102] [6222, 6642] [7214, 7306] ...
      [3]       [   1, 1439] [2036, 2062] [6788, 6884] ...
      [4]       [   1,  182] [2961, 3061] [7199, 7303] ...
      [5]       [   1,  104] [2961, 3061] [7199, 7303] ...
      ...                                              ...
  [50062]       [   1,   80] [ 280,  315] [1182, 1466] ...
  [50063] [    1,    86] [10414, 10643] [10843, 10878] ...
  [50064]       [   1,   46] [1523, 1557] [4002, 4301] ...
  [50065]       [   1,   71] [1071, 1106] [1851, 2135] ...
  [50066]       [   1,   69] [ 862,  897] [3334, 3633] ...
  -------

 

ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Giovanni M Dall'Olio26k
3
gravatar for Chirag Nepal
4.1 years ago by
Chirag Nepal2.2k
Copenhagen
Chirag Nepal2.2k wrote:

You should be able to download promoter table from UCSC browser. Alternative you can download the gene coordinates in .bed format. Define upstream and downstream region in your assigned promoter region, which is generally 500 or 1000 nucleotides.

upsteam=500

downstream=500

cat ucscRefseq.bed | awk '{ if ($6 == "+") { print $1,$2-'$upstream', $2+'$downstream', $4, $5, $6,$7,$8,$9,$10,$11,$12 } else if ($6 == "-") { print $1, $3-'$upstream', $3+'$downstream', $4,$5,$6,$7,$8,$9,$10,$11,$12 }}' > promoter.bed
ADD COMMENTlink modified 2.5 years ago • written 4.1 years ago by Chirag Nepal2.2k

Trying the awk script and getting the following error:

awk: cmd. line:1:
^ unexpected newline or end of string

Any ideas? Thanks!

ADD REPLYlink written 2.5 years ago by rbronste290

There was one closing bracket missing, edited it, try it now.

ADD REPLYlink written 2.5 years ago by Chirag Nepal2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1360 users visited in the last hour