Question: How Can I Extract All Bidirectional Promoters In The Human Genome From Ucsc Genome Browser?
5
gravatar for Farhat
7.4 years ago by
Farhat2.8k
Pune, India
Farhat2.8k wrote:

What I would like is a bed file of all regions which have two genes on opposite strands and the TSSs are less than 1000bp from each other. I can do this using the entire gene table and some Python coding but I wonder if there is a way to do this using just an SQL query.

genome ucsc • 3.0k views
ADD COMMENTlink modified 4.1 years ago by Biostar ♦♦ 20 • written 7.4 years ago by Farhat2.8k
12
gravatar for Pierre Lindenbaum
7.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum115k wrote:

Using the UCSC mysql server:

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19

mysql> select K1.chrom,K1.name,K2.name,K1.strand,K2.strand,
  LEAST(K1.txStart,K1.txEnd,K2.txStart,K2.txEnd) as L,
  GREATEST(K1.txStart,K1.txEnd,K2.txStart,K2.txEnd) as R
  from
     knownGene as K1,
     knownGene as K2
  where K1.chrom=K2.chrom and
   ( (K1.strand='+' and K2.strand='-'  and ABS(K1.txStart-K2.txEnd) < 1000) or
     (K1.strand='-' and K2.strand='+'  and ABS(K1.txEnd-K2.txStart) <1000) )
 ;

+-------+------------+------------+--------+--------+---------+---------+
| chrom | name       | name       | strand | strand | L       | R       |
+-------+------------+------------+--------+--------+---------+---------+
| chr1  | uc009vjn.1 | uc010nxx.1 | +      | -      |  761586 |  788902 | 
| chr1  | uc001abp.1 | uc010nxx.1 | +      | -      |  761586 |  788997 | 
| chr1  | uc001abq.1 | uc010nxx.1 | +      | -      |  761586 |  788997 | 
| chr1  | uc009vjo.1 | uc010nxx.1 | +      | -      |  761586 |  788997 | 
| chr1  | uc001abr.1 | uc010nxx.1 | +      | -      |  761586 |  789740 | 
| chr1  | uc001acz.1 | uc001acx.1 | +      | -      | 1108435 | 1121241 | 
| chr1  | uc001adk.2 | uc001adh.3 | +      | -      | 1152288 | 1170418 | 
| chr1  | uc001adk.2 | uc001adi.3 | +      | -      | 1152288 | 1170418 | 
| chr1  | uc001adk.2 | uc009vjv.2 | +      | -      | 1152288 | 1170418 | 
| chr1  | uc001adk.2 | uc009vjw.2 | +      | -      | 1152288 | 1170418 | 
(...)

Edit: I fixed a problem with my previous answer. In the ucsc, the transcription start index is always on the 5' side (whatever the value of 'strand'). So , you have to take in account if your gene is on the strand '+' or '-' .

ADD COMMENTlink modified 7.4 years ago • written 7.4 years ago by Pierre Lindenbaum115k
2

Beauty! (just like that)

ADD REPLYlink written 7.4 years ago by Adrian Cortes490

Wow! that was quick.

ADD REPLYlink written 7.4 years ago by Farhat2.8k

there is a problem with that query. Give me 5'...

ADD REPLYlink written 7.4 years ago by Pierre Lindenbaum115k

Fixed. See my comment.

ADD REPLYlink written 7.4 years ago by Pierre Lindenbaum115k

Thanks for the edit. It is clear now.

ADD REPLYlink written 7.4 years ago by Farhat2.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1584 users visited in the last hour