Question: Distance Between Genes Of Interest
0
gravatar for anuragm
5.5 years ago by
anuragm130
India
anuragm130 wrote:

I have the UCSC RefSeq track with txstart sites for all the genes. I want to find the distance between consecutive pairs of genes in order to determine which consecutive genes lie within a particular range of each other. How can I do this using Python or R ?

gene ucsc • 2.7k views
ADD COMMENTlink modified 5.5 years ago by Devon Ryan88k • written 5.5 years ago by anuragm130
3
gravatar for Devon Ryan
5.5 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

You could probably do this relatively easily in R with GenomicRanges. Read in the RefSeq track, convert the transcripts to GRanges and then apply a function that computes the distance between a range and the output from nearest().

Edit: Here's an example:

library("GenomicRanges")
refGene <- read.delim("~/Downloads/refGene.txt", header=T)
gr <- GRanges(seqnames=Rle(refGene$chrom), 
    ranges=IRanges(start=refGene$txStart, end=refGene$txEnd, names=refGene$name),
    strand=Rle(strand(refGene$strand))) 
neighbors <- nearest(gr) #This can return NA
REMOVE <- whichis.na(neighbors))
neighbors <- neighbors[-REMOVE]
neighbor <- gr[neighbors]
gr <- gr[-REMOVE]
distances <- distance(gr, neighbor)

I just tried this on my laptop and it seems to work fine. If this isn't exactly what you want, you should be able to easily modify it.

ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by Devon Ryan88k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2475 users visited in the last hour