I am trying to perform GO analysis on RNAseq data I have generated in r. My DEG list of interest is called GRL4v3VisData and my background list is all ensembl names of genes that have at least 1 transcript. This should give more power to the statistics rather than all genes on the zebrafish genome.
Install ViseaGO using
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("ViSEAGO")
Upload data tables from wd
GRL4v3VisData<-data.table::fread("GRL4v3ResultsViseaGO.txt", select = c("ensembl","padj"))
background <- data.table::fread("GRBgViseago.txt", select = c("ensembl","padj"))
Input Data: GRL4v3VisData is 2836 observations of 2 variables
> head(GRL4v3VisData)
ensembl padj
1: ENSDARG00000028396 1.76e-115
2: ENSDARG00000023587 2.46e-58
3: ENSDARG00000075666 5.46e-47
4: ENSDARG00000043154 3.98e-44
5: ENSDARG00000039682 9.37e-42
6: ENSDARG00000022631 3.62e-40
background is 18726 observations of 1 variable
> head(background)
ensembl
1: ENSDARG00000113107
2: ENSDARG00000084828
3: ENSDARG00000093924
4: ENSDARG00000102104
5: ENSDARG00000113105
6: ENSDARG00000103050
Create object of all GO annotations from Ensembl
Ensembl <- ViSEAGO::Ensembl2GO(biomart = "genes", host = "www.ensembl.org", version = NULL)
Annotate Zebrafish genome with GO annotations from Ensembl
myGENE2GO<-ViSEAGO::annotate("drerio_gene_ensembl", Ensembl)
Create topGOdata for Biological Processes: with inputs as: genes selection, genes background, #GO terms category used (MF, BP, or CC), and minimum of annotated genes by GO terms (nodeSize).
BP<-ViSEAGO::create_topGOdata(geneSel=GRL4v3VisData, allGenes=background, gene2GO=myGENE2GO, ont="BP", nodeSize=5)
And the result is:
Error in .local(.Object, ...) : allGenes must be a factor with 2 levels
I am wondering if this is because it is comparing the rownames of both files which are integers rather than the exact Ensembl ID? I understand the factors should be 0 or 1 for False or TRUE, but unsure how to resolve this issue.
Please help!