Setting the Path to the FeatureCounts Count Matrix

Question

ORA with clusterProfiler

0

Entering edit mode

4 months ago

sansan_96 ▴ 80

Hello everyone,

I am trying to do an enrichment analysis of Arabidopsis data, however I am still wondering how to build it or what to use as a background (universe), could you guide me? I am working with this example.

diff_genes <- read_delim(file = "differential_genes.tsv", delim = "\t")

# library("biomartr") (if not loaded already)
biomartr::organismBM(organism = "Arabidopsis thaliana")

arabido_attributes = 
  biomartr::organismAttributes("Arabidopsis thaliana") %>% 
  filter(dataset == "athaliana_eg_gene")
arabido_attributes

attributes_to_retrieve = c("tair_symbol", "entrezgene_id")

result_BM <- biomartr::biomart( genes      = diff_genes$genes,                  # genes were retrieved using biomartr::getGenome()
                                mart       = "plants_mart",                     # marts were selected with biomartr::getMarts()
                                dataset    = "athaliana_eg_gene",               # datasets were selected with biomartr::getDatasets()
                                attributes = attributes_to_retrieve,            # attributes were selected with biomartr::getAttributes()
                                filters =   "ensembl_gene_id" )# query key
head(result_BM)  




# building the universe!
all_arabidopsis_genes <- read.delim("counts.txt", header = T, stringsAsFactors = F)[,1] # directly selects the gene column

# we want the correspondence of TAIR/Ensembl symbols with NCBI Entrez gene ids
attributes_to_retrieve = c("tair_symbol", "uniprotswissprot","entrezgene_id")

# Query the Ensembl API
all_arabidopsis_genes_annotated <- biomartr::biomart(genes = all_arabidopsis_genes,
                                                     mart       = "plants_mart",                 
                                                     dataset    = "athaliana_eg_gene",           
                                                     attributes = attributes_to_retrieve,        
                                                     filters =  "ensembl_gene_id" )  

# for compatibility with enrichGO universe
# genes in the universe need to be characters and not integers (Entrez gene id)
all_arabidopsis_genes_annotated$entrezgene_id = as.character(
  all_arabidopsis_genes_annotated$entrezgene_id)

clusterProfiler ORA • 613 views

ADD COMMENT • link updated 4 months ago by ultraanfibio ▴ 30 • written 4 months ago by sansan_96 ▴ 80

Ram · Answer 1 · 2023-12-12

2

Entering edit mode

4 months ago

ultraanfibio ▴ 30

You should use as background all GENES that were incuded in the differential expression analysis.

Go to your DGEA results, all genes that pass the predefined cutoffs are considered differentially expressed (DEGs), all genes that were compared are your universe.

Then run the overrepresentation analysis with:

results_ora <- enricher(gene = DEG_GENES, universe = ALL_GENES, TERM2GENE = GENESETS_OF_INTEREST)
results_enrichr = results_ora@result

If you want to convert between gene IDs in the results (e.g., entrez to Gene Symbol) you may want to use setReadable() from DOSE package

ADD COMMENT • link updated 4 months ago by Ram 43k • written 4 months ago by ultraanfibio ▴ 30

1

Entering edit mode

You should use as background all GENES that were incuded in the differential expression analysis.

Agreed! And OP should review the following as well: Urgent need for consistent standards in functional enrichment analysis

ADD REPLY • link 4 months ago by jv ★ 1.8k

0

Entering edit mode

Thank you very much for your valuable answer, so could I use the genes from my counts matrix after filtering?

# Loading the Library for Differential Expression Analysis

library(DESeq2)

Setting the Path to the FeatureCounts Count Matrix

setwd("C:/project_2023/quantification_featureCounts")

Viewing Files in the Directory

list.files()

Reading the Count Matrix

countData <- read.delim("./matriz_arabidopsis_2023.txt", header = TRUE, row.names = 1) head(countData)

Description of Samples

condition <- factor(c("Control", "Control", "Control", "Treatment", "Treatment", "Treatment")) colData <- data.frame(row.names = colnames(countData), condition) head(colData)

Creating a DESeqDataSet

dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ condition) dds

Filtering Genes with Very Low Expression (less than 10 reads)

dds <- dds[rowSums(counts(dds)) > 10, ]

dds

Performing Standard Differential Expression Analysis

dds <- DESeq(dds) dds

Extracting Results

res <- results(dds) res

Summary of Differential Gene Expression

summary(res)

Sorting the Summary List by Adjusted p-value

res <- res[order(res$padj), ] head(res)

ADD REPLY • link 4 months ago by sansan_96 ▴ 80

1

Entering edit mode

Yes exactly

Your universe would be: ALL_GENE_IDS <- row.names(res)

ADD REPLY • link 4 months ago by ultraanfibio ▴ 30