Distance Between The Genes.
2
1
Entering edit mode
12.1 years ago
Ss ▴ 50

Hey!

I want to calculate the distance between the genes. I got the gene details from Ensemble Biomart. So, I have

Gene_Name Start(bp) End(bp)

I calculated the length of the gene just by subtracting start-end (bp). But I tried getting distance between the genes as following for

Gene1 S1 E1

Gene2 S2 E2

Gene3 S2 E2

where S=Start(bp)

E =End(bp)

Distance between Gene1 and Gene2 = S2-E1 and

Distance between Gene2 and Gene3 = S3-E2 and so on ....

Is it a incorrect way of finding the distance because the distance values I get are quite large to what has been reported.

Thanks.

gene distance • 3.8k views
4
Entering edit mode

What you need to specific about is what your the definition of distance is. Unlike the distance between points the distance between intervals is not standardized. It could be the distance between 5' ends, or it could be the distance between midpoints, it could be the distance that is not covered by either gene, it could be the maximal distance that the genes and their interstitial space covers etc..

1
Entering edit mode

Are you just trying to find genomic separation in bp's between the genes? If so then yes, that looks like what you would do.

1
Entering edit mode

Beware overlapping genes. You might even find a small gene tucked inside of a large gene's intron.

1
Entering edit mode

Are these genes in the same chromosome? if not, the distance can be considered to be infinite.

1
Entering edit mode

Do you have an example of a gene distance that for you is much larger than the reported distance? And can you also refer to where you got this reference from?

0
Entering edit mode

What you need to specific about is what your the definition of distance. Unlike the distance between points the distance of intervals is not standardized. It could be the distance between 5' ends, or it could be the distance between midpoints, it could be the distance that is not covered by either gene, it could be the maximal distance that the genes and their interstitial space covers etc....

0
Entering edit mode

the length of a gene is not |start-end| but |start-end|+1

0
Entering edit mode

@Manu: Or may be |start-end|-1? If you want to count only intergenic bases.

1
Entering edit mode
11.7 years ago
ff.cc.cc ★ 1.3k
       if(S1 < S2)
else if(E1 < S2)
D=S2-E1
Overlap=0
else if(E1 < E2)
D=0 // or S2-S1 if more interesting to your study
Overlap=E1-S2
else if(E1 > E2)
D=0 // or S2-S1
Overlap=E2-S2
else
swap(gene1, gene2)
goto beginning

0
Entering edit mode
12.1 years ago
Rm 8.2k

yes, if both are on the same strand;

___S1----->E1____________S2------->E2_____


if gene1 is in + strand and gene2 in -ve strand then

___S1----->E1____________E2<-------S1_____


then distance between genes will be E2-E1

3
Entering edit mode

No, this is not true. In Ensembl the start coordinate of a gene is by definition smaller than the end coordinate, irrespective of the strand. So, the way SS calculates the distances is correct.

2
Entering edit mode

Remember that distance is not equal to number of bases between the genes. If E1=1000 and S2=1999, there are S2-E1-1 or 998 bases of intergenic sequence here. If S1=1 and and E1=1000, the gene is 1000 (not 999) bp in length.

0
Entering edit mode

@bret; thanks for the info...