Question: Distance Between Genes Of Interest
gravatar for anuragm
7.4 years ago by
anuragm130 wrote:

I have the UCSC RefSeq track with txstart sites for all the genes. I want to find the distance between consecutive pairs of genes in order to determine which consecutive genes lie within a particular range of each other. How can I do this using Python or R ?

gene ucsc • 3.5k views
ADD COMMENTlink modified 7.4 years ago by Devon Ryan98k • written 7.4 years ago by anuragm130
gravatar for Devon Ryan
7.4 years ago by
Devon Ryan98k
Freiburg, Germany
Devon Ryan98k wrote:

You could probably do this relatively easily in R with GenomicRanges. Read in the RefSeq track, convert the transcripts to GRanges and then apply a function that computes the distance between a range and the output from nearest().

Edit: Here's an example:

refGene <- read.delim("~/Downloads/refGene.txt", header=T)
gr <- GRanges(seqnames=Rle(refGene$chrom), 
    ranges=IRanges(start=refGene$txStart, end=refGene$txEnd, names=refGene$name),
neighbors <- nearest(gr) #This can return NA
neighbors <- neighbors[-REMOVE]
neighbor <- gr[neighbors]
gr <- gr[-REMOVE]
distances <- distance(gr, neighbor)

I just tried this on my laptop and it seems to work fine. If this isn't exactly what you want, you should be able to easily modify it.

ADD COMMENTlink modified 7.4 years ago • written 7.4 years ago by Devon Ryan98k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2441 users visited in the last hour