Distance between two gene sets
1
1
Entering edit mode
9.5 years ago
The ▴ 180

Hi,

Can anyone tell me how the functional distance between two gene-sets can be measured? Things that immediately come to mind are GO Term similarities, network distance etc. Any review article is very welcome.

Thank you

distance gene-set • 4.0k views
ADD COMMENT
0
Entering edit mode

Could you maybe elaborate on how distance is defined in this context please? Is it the sum of differences between the gene sequences, or a measure of differences in their ontology, or something else?

ADD REPLY
0
Entering edit mode

You can also ask him to define "gene" :-)

ADD REPLY
0
Entering edit mode

Can you elaborate your comment a bit? Say, If a 'gene' is defined the same way as they are defined in a gene set, how that provides an answer to my question?

ADD REPLY
0
Entering edit mode

I assumed OP was referring to the gene as a structural or functional unit. Structural distance can be quantified, functional "distance", not so much. Come to think of it, could we follow some kind of algorithm that Amazon or Netflix use to tailor recommendation based on previous choices to find functionally homologous genes to a gene under study?

ADD REPLY
0
Entering edit mode

Something akin to "measure of differences in their ontology", in essence I'm trying to capture the differences in their biological functions.

ADD REPLY
0
Entering edit mode

Oh, OK. The problem is, there is no concrete measure of functional similarity between genes. Converting ontology information to quantitative difference information would be arbitrary, at best. For example, in the simplest terms, each gene might have descriptive terms in 3 GO categories. Would you then count differences between the two sets? How would one determine if term A is similar or different from term B?

Is there any precedent for such work? Any place where distances between genes (let alone gene sets) is mentioned in quantitative terms? This information would help a lot.

ADD REPLY
0
Entering edit mode

There is already a lot of work on GOterms semantic similarity measures, I'm looking for some other methods like distances between the networks they constitute etc.

ADD REPLY
0
Entering edit mode

Oh. Gene networks is beyond me. I guess someone more conversant will help you out. In the meantime, might I suggest updating your question to have some detail on how you are looking to gauge similarity and what you expect out of this exercise? Where open discussion might be involved, descriptive questions are always better than one liners.

ADD REPLY
0
Entering edit mode

I don't think you can physically/quantitatively capture the differences. But can't you just throw the lists in DAVID or a gene set enrichment program and see how the function differs?

ADD REPLY
0
Entering edit mode

This question seems to be quite open ended. Should be maybe make this a forum discussion?

Devon Ryan

Istvan Albert

Pierre Lindenbaum

ADD REPLY
0
Entering edit mode
9.5 years ago
Manvendra Singh ★ 2.2k

If you have two genesets in bed format then you could just run a tool from bedtools

closestBed -a geneset1.bed -b geneset2.bed -d > output.bed

-d would tell you distance between them

You can do it with bedops also

closest-features --closest geneset1.bed geneset2.bed > output.bed

hth

ADD COMMENT
0
Entering edit mode

I'm not sure what bedtools do, but I guess from your post that somehow calculates similarity between two/multiple sequences.

ADD REPLY
1
Entering edit mode

no, It calculates distances between two sets of genomic co-ordinates (which I ahd thought that your genes are representing)

It would be easier for us if you always put your questions in understandable way

ADD REPLY
0
Entering edit mode

I was confused too. hence my comment above :-)

ADD REPLY
1
Entering edit mode

No. closestBed will tell you the closest features/genes from set B in set A. However I am not sure what you mean by 'distance' between two gene sets. If there are 100 genes in A and 50 genes in B, do you want to find distance between each possible pair in A & B, such that your resulting file will have 100*50 entries? You should add more details to your question and try to be clear (give an example maybe?).

ADD REPLY
0
Entering edit mode

I don't know about about bedtools and don't know either in which kind of biological problems one can use distance between genomic co-ordinates. Just curious if two genes are in different chromosomes what output of Bedtool provides?

What I have in mind is something similar what you use while calculating distance between clusters in different clustering algorithms. As genes belong to networks average shortest distance between members of the two gene sets can be taken as distance measure. But I guess better methods are available. The simplest distance measure would be calculating the member overlap between the two sets of gene.

ADD REPLY
0
Entering edit mode

BEDOPS and BedTools applications work with files in BED format.

BEDOPS works with BED data in sorted form, in order to get additional speed and memory benefits. Your genes could be, at least, four columns: chromosome, start and stop positions, and a gene name.

A BEDOPS tool like closest-features will report, for each gene, the nearest upstream or downstream gene (and features). You can add a --dist option to report the numerical distance between the nearest edges of the two features.

ADD REPLY
0
Entering edit mode

Add --dist to closest-features to report the distance value of the --closest element.

ADD REPLY
0
Entering edit mode

I thought closest-features --closest would report the distance value of the --closest element. similar to bedtools closest with -d option

ADD REPLY
0
Entering edit mode

The default is to report both the nearest upstream and downstream elements. Using --closest picks the nearer of the two, with one picked at random in case of ties. The --dist option tells you the numerical distance between target feature and the upstream and/or downstream feature, depending on additional options. Check out --help or the online docs for more info.

ADD REPLY
0
Entering edit mode

Yes, --closest picks the genes in the same way you wrote, but I think this would also gives the distance between them in last coloumn of output

ADD REPLY

Login before adding your answer.

Traffic: 2064 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6