Question: Distance Between The Genes.
1
8.4 years ago by
Ss50
Ss50 wrote:

Hey!

I want to calculate the distance between the genes. I got the gene details from Ensemble Biomart. So, I have

`Gene_Name Start(bp) End(bp)`

I calculated the length of the gene just by subtracting start-end (bp). But I tried getting distance between the genes as following for

Gene1 S1 E1

Gene2 S2 E2

Gene3 S2 E2

where S=Start(bp)

E =End(bp)

Distance between Gene1 and Gene2 = S2-E1 and

Distance between Gene2 and Gene3 = S3-E2 and so on ....

Is it a incorrect way of finding the distance because the distance values I get are quite large to what has been reported.

Thanks.

gene distance • 2.5k views
written 8.4 years ago by Ss50
4

What you need to specific about is what your the definition of distance is. Unlike the distance between points the distance between intervals is not standardized. It could be the distance between 5' ends, or it could be the distance between midpoints, it could be the distance that is not covered by either gene, it could be the maximal distance that the genes and their interstitial space covers etc..

1

Are you just trying to find genomic separation in bp's between the genes? If so then yes, that looks like what you would do.

1

Beware overlapping genes. You might even find a small gene tucked inside of a large gene's intron.

1

Are these genes in the same chromosome? if not, the distance can be considered to be infinite.

1

Do you have an example of a gene distance that for you is much larger than the reported distance? And can you also refer to where you got this reference from?

What you need to specific about is what your the definition of distance. Unlike the distance between points the distance of intervals is not standardized. It could be the distance between 5' ends, or it could be the distance between midpoints, it could be the distance that is not covered by either gene, it could be the maximal distance that the genes and their interstitial space covers etc....

the length of a gene is not |start-end| but |start-end|+1

@Manu: Or may be |start-end|-1? If you want to count only intergenic bases.

1
8.0 years ago by
ff.cc.cc1.3k
European Union
ff.cc.cc1.3k wrote:
``````       if(S1 < S2)
else if(E1 < S2)
D=S2-E1
Overlap=0
else if(E1 < E2)
D=0 // or S2-S1 if more interesting to your study
Overlap=E1-S2
else if(E1 > E2)
D=0 // or S2-S1
Overlap=E2-S2
else
swap(gene1, gene2)
goto beginning
``````
0
8.4 years ago by
Rm7.9k
Danville, PA
Rm7.9k wrote:

yes, if both are on the same strand;

``````___S1----->E1____________S2------->E2_____
``````

if gene1 is in + strand and gene2 in -ve strand then

``````___S1----->E1____________E2<-------S1_____
``````

then distance between genes will be E2-E1

3

No, this is not true. In Ensembl the start coordinate of a gene is by definition smaller than the end coordinate, irrespective of the strand. So, the way SS calculates the distances is correct.

2

Remember that distance is not equal to number of bases between the genes. If E1=1000 and S2=1999, there are S2-E1-1 or 998 bases of intergenic sequence here. If S1=1 and and E1=1000, the gene is 1000 (not 999) bp in length.