Question: GOseq analysis with non-native organism
0
gravatar for Bin
14 months ago by
Bin0
Bin0 wrote:

Hello, everyone. I am a beginner using goseq for gene enrichment analysis. The organism I am working on is mouse, which should be able to be done using mm10 or mm9 packages. However, it seems that the UCSC has changed their links or something, and now goseq is not able to use the library automatically. So I could not follow the standard instruction in manuell to make an analysis. Instead, I have exacted the information needed for goseq by myself and turned it into a non-native organism,using following codes:

library("goseq")
library("geneLenDataBase")
library("org.Mm.eg.db")
library("mygene")
library("GenomicFeatures")
library("biomaRt")
library("ensembldb")
database<-read.table("Mouse Ensembl IDs and Transcript Lengths.txt",header=FALSE,fill=TRUE)
geneset<-read.csv("DCs_Glc-DAG induced genes dependant on clec4e.csv")
#get information about all the genes in the array, save the name of the genes as a character vector
AllGenes<-geneset$Gene.Name
DifGenes<-AllGenes[1:90]
AllGenes.vector<-c(t(as.character(AllGenes)))
DifGenes.vector<-c(t(as.character(DifGenes)))
#find ensemnr of the genes
Ensemb.AllGenes<-queryMany(AllGenes.vector,scope="symbol",fields="ensembl.gene",species="mouse")
Ensemb.DifGenes<-queryMany(DifGenes.vector,scopes="symbol",fields="ensembl.gene",species="mouse")
Ensnr.AllGenes<-Ensemb.AllGenes$ensembl.gene
Ensnr.DifGenes<-Ensemb.DifGenes$ensembl.gene

#gene.vector=as.integer(AllGenes.vector%in%DifGenes.vector)
#produce a txdb database for mouse, automatically if getting access to UCSC mm10
txsByGene=transcriptsBy(TxDb.Mmusculus.UCSC.mm10.ensGene,"gene")
alllengthData=median(width(txsByGene))
txsByGene.frame<-data.frame(txsByGene)
#produce a vector of integer 0 and 1 in order to see whether the gene wanted was expressed
gene.vector=as.integer(Ensnr.AllGenes%in%Ensnr.DifGenes)
alllengthData<-as.data.frame(alllengthData)
#use match to find out all the length data needed for the test
length<-alllengthData[match(Ensnr.AllGenes,row.names(alllengthData)),]
#Ensnr.AllGenes_sub<-Ensnr.AllGenes[Ensnr.AllGenes%in%rownames(alllengthData)]
#produce a pwf to show how good was the genes fitted
pwf=nullp(gene.vector,bias.data=length,'ensGene',plot.fit=TRUE)
#produce a dataset of GO terms and add it to the goseq function
mmouse = useDataset(dataset="mmusculus_gene_ensembl",mart=useMart ("ensembl"))
GOmap = getBM (filters = "ensembl_gene_id", attributes = c("ensembl_gene_id", "go_id"),values= Ensnr.AllGenes,mart = mmouse)
#-----------------------------everything goes well until here----------------------------------#
GO.wall=goseq(pwf,gene2cat =GOmap_BP)
enriched.GO=GO.wall$category[GO.wall$over_represented_pvalues<0.05]
#write out the terms interested
library(Go.db)
capture.output(for(go in enriched.GO[1:length(enriched.GO)]{
  print(GOTERM[go])
  cat("____________")
},file="Analysis, mouse project.txt")

until pwf was everything OK and I got a fitted plot showing the Probability Weighting Function.In GOmap step, the result was a dataframe of 2 variables, also as expected. However, in the next step, GO.wall and its function goseq kept telling me there was an error

"Using manually entered categories.
Error in `[.default`(summary(map), , 1) : incorrect number of dimensions
In addition: Warning message:
In goseq(pwf, gene2cat = GOmap_BP) :
  Gene column could not be identified in gene2cat conclusively, using the one headed ensembl_gene_id"

Does anyone have some idea about why it happens and how to solve it? Thanks in advance.

rna-seq R software error • 788 views
ADD COMMENTlink modified 14 months ago • written 14 months ago by Bin0

By non-native I suppose you mean non-model organism?

ADD REPLYlink written 14 months ago by h.mon25k

Naja, somehow. I mean, in fact it is a model organism (mouse), so the library mm10 I am using was in fact right, they should contain all the information I needed. But goseq package now has problem in using mm10 or mm9 Length library. So I have to get the lengths needed for testing and comparing firstly by myself (that was the codes before pwf things was talking about). But luckily the problem was solved, just in another way: they at least still have access to the database I wanted when using goseq function. So I do not really have to bother defining my own compare database and worry about the dimension problem. However, I would be quite appreciative if you happen to know the reason why this error occurred. Thank you for replying.

ADD REPLYlink written 14 months ago by Bin0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1796 users visited in the last hour