9.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Using the UCSC mysql server:
$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19
mysql> select K1.chrom,K1.name,K2.name,K1.strand,K2.strand,
LEAST(K1.txStart,K1.txEnd,K2.txStart,K2.txEnd) as L,
GREATEST(K1.txStart,K1.txEnd,K2.txStart,K2.txEnd) as R
from
knownGene as K1,
knownGene as K2
where K1.chrom=K2.chrom and
( (K1.strand='+' and K2.strand='-' and ABS(K1.txStart-K2.txEnd) < 1000) or
(K1.strand='-' and K2.strand='+' and ABS(K1.txEnd-K2.txStart) <1000) )
;
+-------+------------+------------+--------+--------+---------+---------+
| chrom | name | name | strand | strand | L | R |
+-------+------------+------------+--------+--------+---------+---------+
| chr1 | uc009vjn.1 | uc010nxx.1 | + | - | 761586 | 788902 |
| chr1 | uc001abp.1 | uc010nxx.1 | + | - | 761586 | 788997 |
| chr1 | uc001abq.1 | uc010nxx.1 | + | - | 761586 | 788997 |
| chr1 | uc009vjo.1 | uc010nxx.1 | + | - | 761586 | 788997 |
| chr1 | uc001abr.1 | uc010nxx.1 | + | - | 761586 | 789740 |
| chr1 | uc001acz.1 | uc001acx.1 | + | - | 1108435 | 1121241 |
| chr1 | uc001adk.2 | uc001adh.3 | + | - | 1152288 | 1170418 |
| chr1 | uc001adk.2 | uc001adi.3 | + | - | 1152288 | 1170418 |
| chr1 | uc001adk.2 | uc009vjv.2 | + | - | 1152288 | 1170418 |
| chr1 | uc001adk.2 | uc009vjw.2 | + | - | 1152288 | 1170418 |
(...)
Edit: I fixed a problem with my previous answer. In the ucsc, the transcription start index is always on the 5' side (whatever the value of 'strand'). So , you have to take in account if your gene is on the strand '+' or '-' .