What should be done with newly found genes DEGs in over-representation analysis
1
0
Entering edit mode
16 months ago
ghs101 • 0

Hi,

I have about 20 newly found differently expressed genes in my dataset (total DEGs 460) with no ENSEMBLE id. Should I leave them in or remove them from the downstream analysis? What is the correct procedure for over-representation analysis in this case?

GSE analysis enrichment over-representation • 488 views
ADD COMMENT
2
Entering edit mode
16 months ago
ATpoint 82k

In my opinion the "correct" way of doing any enrichment/overrepresentation (for example against REACTOME terms) is to define the "universe" or "background" correctly.

  • The test set is all DEGs, filtered for genes that have an annotation in the database
  • The background/universe is all genes eligable for DEG analysis. In case of something like DESeq2 that would be the genes surviving the independent filtering (=not having NA in the padj column) or in edgeR that would be genes after applying filterByExpr, again filtered for genes that have an annotation in the database.

Functions like enricher() from clusterProfiler support such an analysis. Setting appropriate background is critical to obtain meaningful statistics. It obviously makes a difference if you enrich for example against 8,000 genes that meet the criteria for your "universe" (so annotated in the database such as REACTOME and eligable in your analysis (=expressed)), versus just using all like 50,000 annotated genes regardless of expression status. The latter would give much more lenient and inflated statistics, but a lot of false positives.

That having said, if a gene has no annotations then you anyway cannot get any enrichment results for it in terms of known pathways, hence I'd remove it.

ADD COMMENT

Login before adding your answer.

Traffic: 1327 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6