Question: GOexpress analysis, removing genes with a score of 0?
gravatar for draper
2.7 years ago by
draper0 wrote:

Hello, I am working in a lab and am trying to negotiate R studio. Currently I am using a program called GOexpress. I have created an expression set. During this analysis it appears that many genes have a score of 0, I am wondering if there is a way to filter out such genes so that the GOterm score (which is the average of the scores of its genes) is not lowered by many genes with a score of 0. I am not very computer literate or very good with R studio so detailed instructions would be most appreciated! Thank you in advance

I have created an expression set using these commands:

Assay data

library("Biobase") exprs<-as.matrix(cts) class (exprs) [1] Matrix dim(exprs) [1]5101 34 colnames(exprs) head(exprs[,1:5]) minimalSet <- ExpressionSet(assayData=exprs)

Phenotypic Data

pData <- coldata dim(pData) rownames(pData) summary(pData) all(rownames(pData)==colnames(exprs)) [1] TRUE names(pData) [1] "condition" "timepoint" sapply(pData, class) condition timepoint "factor" "factor" pData[c(15, 20), c("condition", "timepoint")] condition timepoint LN_1.5h_2 LN 1.5h LP_24h_1 LP 24h

Meta Data

metadata <- data.frame(labelDescription= c("CM growth conditions", "growth time"), row.names=c("condition", "timepoint")) phenoData <- new("AnnotatedDataFrame", data=pData, varMetadata=metadata) phenoData An object of class 'AnnotatedDataFrame' rowNames: RM_0.5h_1 RM_0.5h_2 ... LN_49h_3 (34 total) varLabels: condition timepoint varMetadata: labelDescription head(pData(phenoData)) condition timepoint RM_0.5h_1 RM 0.5h RM_0.5h_2 RM 0.5h RM_0.5h_3 RM 0.5h LP_0.5h_2 LP 0.5h LP_0.5h_3 LP 0.5h LN_0.5h_2 LN 0.5h phenoData[c("RM_0.5h_1","LN_49h_3"),"condition"] An object of class 'AnnotatedDataFrame' rowNames: RM_0.5h_1 LN_49h_3 varLabels: condition varMetadata: labelDescription


experimentData <- new("MIAME",name="Kian Draper",lab="Stephen Rader Lab",contact="",title="C merolae growth in LN",other=list(notes="Created from text files"))

exampleSet <-ExpressionSet(assayData=exprs,phenoData=phenoData,experimentData=experimentData)

exampleSet is the expression set I have made I then performed the GOexpress analysis using these commands:



Prepare group factor to analyze

is.factor(exampleSet$"condition") [1] TRUE exampleSet$condition <- factor(exampleSet$condition)

Custom Annotations

library(biomaRt) listMarts(host='')

Update ensembl for the specified species and mart

ensembl <- useMart(biomart = "plants_mart", dataset = "cmerolae_eg_gene",host = "") Determine which attributes and filters you need filters = listFilters(ensembl) filters attributes = listAttributes(ensembl) attributes

Put into tables with names

AllGO = getBM(attributes = c("go_id", "name_1006", "namespace_1003"), mart=ensembl) GOgenes = AllGenes = getBM(attributes = c("ensembl_gene_id", "go_id"), mart=ensembl) AllGenes = getBM(attributes = c("ensembl_gene_id","external_gene_name", "description"), mart=ensembl)

change column names

colnames(GOgenes)[1] = 'gene_id' colnames(AllGenes)[1] = ‘gene_id'

Run random forest algorithm

set.seed(4543) exampleSet_results <- GO_analyse(eSet = exampleSet, f = “condition", GO_genes = GOgenes, all_GO = AllGO, all_genes = AllGenes)

Permutation for P value based on ontologies

exampleSet_results.pVal = pValue_GO(result=exampleSet_results, N=1000)

Filtering the results

BP.5 <- subset_scores(result = exampleSet_results.pVal,namespace = "biological_process",total = 5, p.val=0.05) MF.10 <- subset_scores(result = exampleSet_results.pVal,namespace = "molecular_function",total = 10, p.val=0.05) CC.15 <- subset_scores(result = exampleSet_results.pVal,namespace = "cellular_component", total = 15, p.val=0.05)

Determine the top ranking GO terms & make a heat map for top ranking GO term

head(BP.5$GO) heatmap_GO(go_id = "GO:0033014", result = BP.5, eSet=exampleSet, cexRow=0.4, cexCol=1, cex.main=1, main.Lsplit=30) heatmap_GO(go_id = "GO:0007018", result = BP.5, eSet=exampleSet, cexRow=1, cexCol=1, cex.main=1, main.Lsplit=20)

Genes associated with GO term of interest

table_genes(go_id = "GO:0033014", result = BP.5)[,c(1:3)] expression_plot(gene_id = "CME132C", result = BP.5, eSet=exampleSet, x_var = "timepoint", title.size=1.5,legend.title.size=10, legend.text.size=10, legend.key.size=15, ylim=c(0,10000))

At this point many of the genes from table_genes have a score of 0, which seems to be bringing down the average score which is the score for the GOterm. I understand that you can specify the minimum number of genes in a GOterm with "total= " but I am wondering if there is a way to remove all genes with a score of 0?

ADD COMMENTlink written 2.7 years ago by draper0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2392 users visited in the last hour