Hi there Biostars, I'm working on a GO enrichment analysis for some wheat RNAseq data. I'd like to use the package GOseq for this and have been following the vignette. The package requires 3 data sets, first: a vector of all the genes in your transcriptome, with a '1' denoting DE genes, and '0' for non-DE genes, second: a vector for all of the genes, with the length of each gene, and third: a data frame with two columns for all of the genes and GO terms (each gene will have multiple GO terms so repeating rows), OR a list of lists where the name of each list is the gene name with a list of GO terms.
I had no problem fitting the Probability Weighting Function (PWF) with: pwf = nullp(DEgenes, bias.data = my_length_vector)
The GO terms I downloaded for wheat from BioMart are in the two column data frame format, so that's what I tried first with the code:
GO.wall = goseq(pwf, gene2cat = wheat_GO_terms) but get three errors:
Error: node stack overflow.
Error during wrapup: node stack overflow.
Error: no more error handlers available (recursive errors?); invoking 'abort' restart
Does anyone know how to overcome these errors in GOseq?
I manually created a very short list of lists to see if that works, and it does but I am struggling to create the list of lists from the two column data frame with repeating row values. The GOseq manual indicates the data frame approach should work.
I'd love to hear from you if you've had success with the data frame input format. OR if you can help with converting data frame of repeating row gene names associated with unique GO terms in the second column to a list of lists where gene name lists of the GO terms, that would be fantastic. Thank you!!