Gene set size effect on Gene ontology Semantic Similarity score
4.3 years ago
ash3m21 • 0

Hello everyone,

My name is Ravi and I am a doctoral student studying the biological processes in human ageing. Recently we wanted to also have a bioinformatic analysis of the same. I am trying to understand the effect gene set size has when I am computing the GO semantic similarity score using the R package 'GOSemSim'.

I have a fixed data set containing about 2000 genes, labelled TraitA.

I compute the semantic similarity between TraitA and several other traits, labelled Trait_Random. Trait_Random will have anywhere from 10 to 2000 genes.

How does this difference in gene set size affects the score that I get?

Also is there any statistical method that I could use if there is a bias in the score generated?

Any thoughts or inputs on this would be very helpful. Thank you so much for your time.

4.3 years ago
Guangchuang Yu ★ 2.4k

should not have bias on gene set size. please refer to the vignette, which describe the calculation in details.


