Gene set enrichment analysis differences between 2020 and 2024
28 days ago


I had performed GSEA on my RNAseq data that had given some enriched pathways in 2020. When I run the analysis again nowadays with the same dataset, I do not find the same enriched pathways.

Could anyone explain that to me, please?

Thank you for your help


Gene-ontology
28 days ago
ATpoint 82k

Different input data, code, software versions and pathway annotations. Either one of it or a combination. Impossible to know without any code or data. You should always version-control your code ald software versions and make backups of the external resources (e.g. pathway annotations) you use.

27 days ago

We have releases about monthly in order to keep our data accurate and reflect current understanding. In 4 years, there have been about 40 releases, so some variation is absolutely expected. The GO, as well as any annotations, are always evolving. We strongly recommend recording and reporting release information with any work, and this is especially important if you might publish the work for others to try and reproduce later.

From our FAQs:

Sometimes the number of GO annotations changes significantly over a short period of time. Why?

Most annotations in association files are electronically inferred (IEA). As with all types of annotations, IEAs change over time, with an overall increasing trend. However, in the specific case of IEAs, significant fluctuations in numbers may sometimes be observed over a short period of time. Nearly always, these are not due to bugs, but rather to the following reasons and/or to a combination thereof:

  • All IEA annotations that are over one year old are removed from association files. This is part of quality control procedures. Another procedure the GO started implementing in mid-2014 are taxonomic checks.
  • Electronic annotations are provided to UniProt-GOA by various groups, including Ensembl, InterPro and UniProt. UniProt-GOA then includes these in their annotation files that they submit to the GO Consortium. There are numerous reasons why electronic annotations can fluctuate; e.g., InterPro may have changed a mapping that affected a large number of annotations; a mapping between a GO term and a UniProt keyword may have been added or removed; Ensembl may have changed their orthology sets; new quality checking procedures may have been introduced; a supplying group may have had a problem providing the annotations. Since electronic annotations tend to hit a large number of proteins, it is more likely to observe larger fluctuations than one would in a manual annotation set. UniProt-GOA aims to record all the known changes to the datasets they provide in the release notes here:
  • New genome assemblies for various species are periodically released, and that may contribute to changes in gene annotations.
  • Changes are good. Our knowledge foundation is growing and increasing and information is continuing to be added based on existing, older literature.

Relevant paper: Understanding how and why the Gene Ontology and its annotations evolve: the GO within UniProt.

However, if you think that an observed change in the size of an annotation file cannot be explained by any of the above, and suspect a bug, please contact us.


