Question: Distance between two gene sets
1
gravatar for The
4.5 years ago by
The100
United States
The100 wrote:

Hi,

Can anyone tell me how the functional  distance between two gene-sets can be measured? Things that immediately come to mind are GO Term similarities, network distance etc.  Any review article is very welcome. 

Thank you

 

distance gene-set • 2.1k views
ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by The100

Could you maybe elaborate on how distance is defined in this context please? Is it the sum of differences between the gene sequences, or a measure of differences in their ontology, or something else?

ADD REPLYlink written 4.5 years ago by RamRS21k

You can also ask him to define "gene" :-)

ADD REPLYlink written 4.5 years ago by PoGibas4.8k

Can you elaborate your comment a bit? Say, If a 'gene' is defined the same way as they are defined in a gene set, how that provides an answer to my question?

ADD REPLYlink written 4.5 years ago by The100

I assumed OP was referring to the gene as a structural or functional unit. Structural distance can be quantified, functional "distance", not so much. Come to think of it, could we follow some kind of algorithm that Amazon or Netflix use to tailor recommendation based on previous choices to find functionally homologous genes to a gene under study?

ADD REPLYlink written 4.5 years ago by RamRS21k

Something akin to  "measure of differences in their ontology" , in essence I'm trying to capture the differences in their biological functions.

ADD REPLYlink written 4.5 years ago by The100

Oh, OK. The problem is, there is no concrete measure of functional similarity between genes. Converting ontology information to quantitative difference information would be arbitrary, at best. For example, in the simplest terms, each gene might have descriptive terms in 3 GO categories. Would you then count differences between the two sets? How would one determine if term A is similar or different from term B?

Is there any precedent for such work? Any place where distances between genes (let alone gene sets) is mentioned in quantitative terms? This information would help a lot.

ADD REPLYlink written 4.5 years ago by RamRS21k

There is already alot of work on GOterms semantic similarity measures, I'm looking for some other methds like distances between the networks they constitute etc.  

ADD REPLYlink written 4.5 years ago by The100

Oh. Gene networks is beyond me. I guess someone more conversant will help you out. In the meantime, might I suggest updating your question to have some detail on how you are looking to gauge similarity and what you expect out of this exercise? Where open discussion might be involved, descriptive questions are always better than one liners.

ADD REPLYlink written 4.5 years ago by RamRS21k

I don't think you can physically/quantitatively capture the differences. But can't you just throw the lists in DAVID or a gene set enrichment program and see how the function differs? 

ADD REPLYlink written 4.5 years ago by komal.rathi3.4k
0
gravatar for Manvendra Singh
4.5 years ago by
Manvendra Singh2.0k
Berlin, Germany
Manvendra Singh2.0k wrote:

If you have two genesets in bed format then you could just run a tool from bedtools

closestBed -a geneset1.bed -b geneset2.bed -d > output.bed

-d would tell you distance between them

you can do it with bedops also

closest-features --closest geneset1.bed geneset2.bed > output.bed

hth

ADD COMMENTlink written 4.5 years ago by Manvendra Singh2.0k

I'm not sure what bedtools do, but I guess from your post that somehow calculates similarity between two/multiple sequences.

ADD REPLYlink written 4.5 years ago by The100
1

no, It calculates distances between two sets of genomic co-ordinates (which I ahd thought that your genes are representing)

It would be easier for us if you always put your questions in understandable way

ADD REPLYlink written 4.5 years ago by Manvendra Singh2.0k

I was confused too. hence my comment above :-)

ADD REPLYlink written 4.5 years ago by RamRS21k
1

No. closestBed will tell you the closest features/genes from set B in set A. However I am not sure what you mean by 'distance' between two gene sets. If there are 100 genes in A and 50 genes in B, do you want to find distance between each possible pair in A & B, such that your resulting file will have 100*50 entries? You should add more details to your question and try to be clear (give an example maybe?).

ADD REPLYlink written 4.5 years ago by komal.rathi3.4k

I dont kow about about bedtools and dont' know either in which kind of biolgical problems one can use distance between genomic co-ordinates. Just curious if two genes are in different chromosomes what  output of Bedtool provides? 

What I have in mind is something similar what you use while calculating distance between clusters in differnt clustering algorithms.  As genes belong to networks averagae shortest distance between members of the two gene sets can be taken as distance measure. But I guess better methods are available. The simplest distance measure  would be calculating the member overlap between the two sets of gene. 

ADD REPLYlink written 4.5 years ago by The100

BEDOPS and BedTools applications work with files in BED format.

BEDOPS works with BED data in sorted form, in order to get additional speed and memory benefits. Your genes could be, at least, four columns: chromosome, start and stop positions, and a gene name.

A BEDOPS tool like closest-features will report, for each gene, the nearest upstream or downstream gene (and features). You can add a --dist option to report the numerical distance between the nearest edges of the two features.

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by Alex Reynolds28k

Add --dist to closest-features to report the distance value of the --closest element.

ADD REPLYlink written 4.5 years ago by Alex Reynolds28k

I thought closest-features --closest would report the distance value of the --closest element. similar to bedtools closest with -d option

ADD REPLYlink written 4.5 years ago by Manvendra Singh2.0k

The default is to report both the nearest upstream and downstream elements. Using --closest picks the nearer of the two, with one picked at random in case of ties. The --dist option tells you the numerical distance between target feature and the upstream and/or downstream feature, depending on additional options. Check out --help or the online docs for more info.

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by Alex Reynolds28k

Yes, --closest picks the genes in the same way you wrote, but I think this would also gives the distance between them in last coloumn of output

ADD REPLYlink written 4.5 years ago by Manvendra Singh2.0k
0
gravatar for RamRS
4.5 years ago by
RamRS21k
Houston, TX
RamRS21k wrote:

This question seems to be quite open ended. Should be maybe make this a forum discussion? 

Devon Ryan

Istvan Albert

Pierre Lindenbaum

ADD COMMENTlink written 4.5 years ago by RamRS21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1834 users visited in the last hour