Gene Cluster Analysis and Normalised Enrichment Score
2
0
Entering edit mode
2.6 years ago

Hello,

I am pretty new to bioinformatics; I am doing an immunology research project in which I am learning how to use R. I am now supposed to use the DOSE package from BioConductor to analyse a dataset downloaded from an interaction network I created on STRING with a list of selected proteins.

I have been asked to filter the genes by NORMALISED ENRICHMENT SCORE (NES) with the DOSE package but I have no idea on how to obtain it with the data downloaded from STRING, because NES is not included anywhere in the excel files I downloaded and imported from STRING (the ones containing the interaction networks between my list of proteins, the file found underneath the "clusters" option).

What I am trying to obtain from this is a list with the proteins which are most repeated in the pathway list obtained from STRING ("Biological Process", "WikiPathways", etc.), but I am a bit confused on how to get them.

Does anyone know how to do it?

Thank you

analysis enrichment dose r gene • 1.1k views
ADD COMMENT
1
Entering edit mode
2.6 years ago

If you just want to count the most repeated proteins/genes, just count the frequency of the genes/proteins in the 'matching.proteins.in.your.network..labels'. (for the significant genesets). You can do this in R with the function table(). You do not need to use DOSE for that unless you want to recalculate for disease related genesets.

Below is an example of such a frequency plot for the top 25 genesets. The bars are also weighted by geneset size and logFC, so larger genesets are penalized and small logFC of genes are counted less. This plot was made using the Omics Playground in the 'enrichment/top enriched' module.

gene frequency plot

ADD COMMENT
0
Entering edit mode

Thank you, it was really helpful!

ADD REPLY
0
Entering edit mode
2.6 years ago

Not sure exactly what you want or need to do but NES is the score from GSEA geneset enrichment analysis. I am guessing NES is one of the output parameters of the results of DOSE?

ADD COMMENT
0
Entering edit mode

I am very confused on how DOSE works. What I have is this:

  • downloaded cluster analysis file from STRING interaction pathway
  • downloaded STRING enrichment datasets from STRING (e.g., Biological Process, KEGG pathway, etc.)

What I want to see is which genes among the ones listed in the STRING enrichment datasets are the most diffused among the list of pathways detected by STRING and listed in the STRING enrichment datasets I downloaded.

To make it clearer ('cause it may sound quite confusing without visualising it), I attach here a screenshot of the dataset I have.

downloaded dataset downloaded dataset

Thank you so much!

ADD REPLY

Login before adding your answer.

Traffic: 1715 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6