Question

How Can I Extract All Bidirectional Promoters In The Human Genome From Ucsc Genome Browser?

5

Entering edit mode

14.0 years ago

Farhat ★ 2.9k

What I would like is a bed file of all regions which have two genes on opposite strands and the TSSs are less than 1000bp from each other. I can do this using the entire gene table and some Python coding but I wonder if there is a way to do this using just an SQL query.

genome ucsc • 5.1k views

ADD COMMENT • link updated 4.3 years ago by Ram 45k • written 14.0 years ago by Farhat ★ 2.9k

Ram · Answer 1 · 2011-07-07

12

Entering edit mode

14.0 years ago

Pierre Lindenbaum 166k

Using the UCSC mysql server:

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19

mysql> select K1.chrom,K1.name,K2.name,K1.strand,K2.strand,
  LEAST(K1.txStart,K1.txEnd,K2.txStart,K2.txEnd) as L,
  GREATEST(K1.txStart,K1.txEnd,K2.txStart,K2.txEnd) as R
  from
     knownGene as K1,
     knownGene as K2
  where K1.chrom=K2.chrom and
   ( (K1.strand='+' and K2.strand='-'  and ABS(K1.txStart-K2.txEnd) < 1000) or
     (K1.strand='-' and K2.strand='+'  and ABS(K1.txEnd-K2.txStart) <1000) )
 ;

+-------+------------+------------+--------+--------+---------+---------+
| chrom | name       | name       | strand | strand | L       | R       |
+-------+------------+------------+--------+--------+---------+---------+
| chr1  | uc009vjn.1 | uc010nxx.1 | +      | -      |  761586 |  788902 | 
| chr1  | uc001abp.1 | uc010nxx.1 | +      | -      |  761586 |  788997 | 
| chr1  | uc001abq.1 | uc010nxx.1 | +      | -      |  761586 |  788997 | 
| chr1  | uc009vjo.1 | uc010nxx.1 | +      | -      |  761586 |  788997 | 
| chr1  | uc001abr.1 | uc010nxx.1 | +      | -      |  761586 |  789740 | 
| chr1  | uc001acz.1 | uc001acx.1 | +      | -      | 1108435 | 1121241 | 
| chr1  | uc001adk.2 | uc001adh.3 | +      | -      | 1152288 | 1170418 | 
| chr1  | uc001adk.2 | uc001adi.3 | +      | -      | 1152288 | 1170418 | 
| chr1  | uc001adk.2 | uc009vjv.2 | +      | -      | 1152288 | 1170418 | 
| chr1  | uc001adk.2 | uc009vjw.2 | +      | -      | 1152288 | 1170418 | 
(...)

Edit: I fixed a problem with my previous answer. In the ucsc, the transcription start index is always on the 5' side (whatever the value of 'strand'). So , you have to take in account if your gene is on the strand '+' or '-' .

ADD COMMENT • link updated 4.3 years ago by Ram 45k • written 14.0 years ago by Pierre Lindenbaum 166k

2

Entering edit mode

Beauty! (just like that)

ADD REPLY • link updated 4.3 years ago by Ram 45k • written 14.0 years ago by Adrian Cortes ▴ 550

0

Entering edit mode

Wow! that was quick.

ADD REPLY • link updated 4.3 years ago by Ram 45k • written 14.0 years ago by Farhat ★ 2.9k

0

Entering edit mode

there is a problem with that query. Give me 5'...

ADD REPLY • link updated 4.3 years ago by Ram 45k • written 14.0 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Fixed. See my comment.

ADD REPLY • link updated 4.3 years ago by Ram 45k • written 14.0 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Thanks for the edit. It is clear now.

ADD REPLY • link updated 4.3 years ago by Ram 45k • written 14.0 years ago by Farhat ★ 2.9k