GOSeq gene ontology enrichment analysis in rice
1
1
Entering edit mode
3.9 years ago
Bioinfonext ▴ 460

I need your help for Goseq R code for gene ontology enrichment analysis for differentially expressed gene identified by DESeq2.

I got the differentially expressed genes and I can also download the mapping file from biomart for all the rice gene ID like below:

Gene stableID   Transcript stable ID           GO term accession
BGIOSGA013239  BGIOSGA013239-TA                   GO:0009098
BGIOSGA013239  BGIOSGA013239-TA                   GO:0003862
BGIOSGA013239  BGIOSGA013239-TA                   GO:0009082
BGIOSGA013239  BGIOSGA013239-TA                   GO:0016616
BGIOSGA013239  BGIOSGA013239-TA                   GO:0051287

.....................

Goseq code:
d <- read.csv("deseq2res.csv", header=T, row.names=1)
all_genes <- row.names(d)
 DE_genes <- all_genes[d$padj<0.05]

I am not sure how should I proceed further after this? I am not able to understand how should I get the genes.vector and length.vector.names for the below code and then GO_data.frame?

pwf <- nullp(genes.vector,bias.data=length.vector.names)
head(pwf)

# calculate GO enrichment using default method
GO.WALL <- goseq(pwf, gene2cat=GO_data.frame)

Many thanks, Bioinfonext

goseq R • 1.6k views
ADD COMMENT
0
Entering edit mode

Not a GOSeq solution but since you are working with Oryza sativa indica you could use AgriGO v2.

The input file will be a list of gene_id i.e, the gene_id of your differentially expressed genes

ADD REPLY
0
Entering edit mode
3 months ago

For the genes vector you need a named vector with one entry for each gene in your experiment, and a 1 if it is DE and a 0 if it is not. The names should be the gene names.

From your question this could be generated using the vectors you have:

genes.vector <- as.numeric(d$padj < 0.05)
names(genes.vector) <- all_genes

For the GO categories, you want not a frame, but a list, with each entry named for a gene, and containing a vector of all the go categories associated with that gene.

Your can geneate that from the mapping file:

go_df <- read.delim("biomart_go_output.txt")
genes2cat <- split(go_df$GO.accession.term,go_df$Gene.stableID)

For the length.vector.names you need a named vector of the gene lengths, with the names being the geneID of the genes. I can't tell you where to get this for rice, but BioMart might be a good place to start - I suspect you can download a table of exonic lengths (perhaps it would be as the length of each exon), that can then be converted to the vector you need.

ADD COMMENT

Login before adding your answer.

Traffic: 1781 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6