Question: distance between two genes
gravatar for mika6891
3.8 years ago by
mika68910 wrote:


Is there a place where I can retrieve the distance between two genes on the same chromosome? I have a list of 100 genes, so it would be nice to retrieve this information from a database.

genome • 4.4k views
ADD COMMENTlink modified 3.8 years ago by cyril-cros890 • written 3.8 years ago by mika68910

Care to give some examples?

ADD REPLYlink written 3.8 years ago by 5heikki8.4k
gravatar for cyril-cros
3.8 years ago by
cyril-cros890 wrote:

You are in luck, I just wrote a short script to do that with olfactory receptor genes from the mouse genome. I got some fused genes since they are close and with similar sequences.

  1. I use the UCSC browser to download the mm10 mouse genome Ensemble gene table as a bed file, and subset it using grep -f listOlfrGene.txt where listOlfrGene.txt contains Ensembl transcript ids gathered from Biomart (based on a GO term search for olfactory receptor function)
  2. The subset bed file is then sorted using bedtools sort bedFile > olfr_genes_sorted.bed (
  3. I run bedtools closest -s -d -io -N -a olfr_genes_sorted.bed -b olfr_genes_sorted.bed > output.bed. This gets me a new bed file in the format gene #1 bed data | closest gene #2 bed data | distance between #1 and #2. Here the closest gene has to be distinct, on the same strand of the same chromosome and not overlapping (-s -d -io -N options, read the manual).
  4. This file is simplified by running awk '{print $NF,"\t",$1,"\t",$4,"\t",$10}' output.bed > closestOlfrGenes.txt to get the data in the distance | chromosome | geneID #1 | geneID #2 format (which I find more convenient)
  5. sort -n closestOlfrGenes.txt | awk '$1 > 0 {print $0}' > sortedClosestOlfrGenes.txt gets me the values sorted by distance. I use the awk part to get rid of a couple values that were at -10 for some reasons.

You have here a sample from each file Note that the end results is such that you will find paired lines in this format: distanceX gene1 gene2 \n distanceX gene2 gene1 \n

For visualization, with the results here (

closestOlfr=read.csv(file="sortedClosestOlfrGenes.txt",sep="",header=FALSE,na.strings = ".",col.names=c("dist","chr","gene","closest"))
closestOlfr$dist=closestOlfr$dist/1000 # convert to kb
h<-hist(closestOlfr$dist[closestOlfr$dist<=100], breaks=100, col="red", xlab="Distance to closest olfactory gene (kb)", main="Relative proximity of olfactory genes (cut-off at 100kb)") 
print(c("For this threshold (kb):",dist_wanted,"here is the number of close genes",sum(closestOlfr$dist<=dist_wanted)))

I conclude that RNASeq alignment with a maximum intron size of 25000 are still too high.

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by cyril-cros890
gravatar for abascalfederico
3.8 years ago by
abascalfederico1.1k wrote:

You could just obtain the coordinates of those genes and then do a simple arithmetic operation: max(start_gene1, start_gene2)-min(end_gene1,end_gene2), assuming "start" is the lowest coordinate, disregarding the strand orientation ("end" would be the real start of a gene located in the minus strand). In case genes are overlapping you will get a negative number.




ADD COMMENTlink written 3.8 years ago by abascalfederico1.1k

This method may not work if genes have many exons, that mean one gene may have many start points and end points.

ADD REPLYlink written 3.2 years ago by syrttgump30
gravatar for Anima Mundi
3.8 years ago by
Anima Mundi2.4k
Anima Mundi2.4k wrote:

Hello, you can download the genomic coordinates of your genes (e.g. from BioMart), sort the list according to chromosomal location and then measure the distances via scripting.

ADD COMMENTlink written 3.8 years ago by Anima Mundi2.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 911 users visited in the last hour