distance between two genes
3
0
Entering edit mode
9.2 years ago
mika6891 • 0

Hi,

Is there a place where I can retrieve the distance between two genes on the same chromosome? I have a list of 100 genes, so it would be nice to retrieve this information from a database.

genome • 7.8k views
ADD COMMENT
0
Entering edit mode

Care to give some examples?

ADD REPLY
4
Entering edit mode
9.2 years ago
abascalfederico ★ 1.2k

You could just obtain the coordinates of those genes and then do a simple arithmetic operation: max(start_gene1, start_gene2)-min(end_gene1,end_gene2), assuming start is the lowest coordinate, disregarding the strand orientation (end would be the real start of a gene located in the minus strand). In case genes are overlapping you will get a negative number.

ADD COMMENT
0
Entering edit mode

This method may not work if genes have many exons, that mean one gene may have many start points and end points.

ADD REPLY
4
Entering edit mode
9.2 years ago
cyril-cros ▴ 950

You are in luck, I just wrote a short script to do that with olfactory receptor genes from the mouse genome. I got some fused genes since they are close and with similar sequences.

  1. I use the UCSC browser to download the mm10 mouse genome Ensemble gene table as a bed file, and subset it using grep -f listOlfrGene.txt where listOlfrGene.txt contains Ensembl transcript ids gathered from Biomart (based on a GO term search for olfactory receptor function)
  2. The subset bed file is then sorted using bedtools sort bedFile > olfr_genes_sorted.bed (http://bedtools.readthedocs.org/en/latest/index.html)
  3. I run bedtools closest -s -d -io -N -a olfr_genes_sorted.bed -b olfr_genes_sorted.bed > output.bed. This gets me a new bed file in the format gene #1 bed data | closest gene #2 bed data | distance between #1 and #2. Here the closest gene has to be distinct, on the same strand of the same chromosome and not overlapping (-s -d -io -N options, read the manual).
  4. This file is simplified by running awk '{print $NF,"\t",$1,"\t",$4,"\t",$10}' output.bed > closestOlfrGenes.txt to get the data in the distance | chromosome | geneID #1 | geneID #2 format (which I find more convenient)
  5. sort -n closestOlfrGenes.txt | awk '$1 > 0 {print $0}' > sortedClosestOlfrGenes.txt gets me the values sorted by distance. I use the awk part to get rid of a couple values that were at -10 for some reasons.

You have here a sample from each file http://pastebin.com/dMh7MQUU. Note that the end results is such that you will find paired lines in this format: distanceX gene1 gene2 \n distanceX gene2 gene1 \n

For visualization, with the results here (http://imgur.com/caNxDew):

library(dplyr)
closestOlfr=read.csv(file="sortedClosestOlfrGenes.txt",sep="",header=FALSE,na.strings = ".",col.names=c("dist","chr","gene","closest"))
closestOlfr$dist=closestOlfr$dist/1000 # convert to kb
h<-hist(closestOlfr$dist[closestOlfr$dist<=100], breaks=100, col="red", xlab="Distance to closest olfactory gene (kb)", main="Relative proximity of olfactory genes (cut-off at 100kb)") 
dist_wanted=20
print(c("For this threshold (kb):",dist_wanted,"here is the number of close genes",sum(closestOlfr$dist<=dist_wanted)))

I conclude that RNA-Seq alignment with a maximum intron size of 25000 are still too high.

ADD COMMENT
3
Entering edit mode
9.2 years ago
Anima Mundi ★ 2.9k

Hello, you can download the genomic coordinates of your genes (e.g. from BioMart), sort the list according to chromosomal location and then measure the distances via scripting.

ADD COMMENT

Login before adding your answer.

Traffic: 1293 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6