Question: goseq for non-native species infinite recursion
0
gravatar for nsl24
14 months ago by
nsl240
nsl240 wrote:

I'm trying to used the results of a differential expression analysis to look for enriched genes using goseq but I'm having a beast of a time even getting a trial for my non-native species working.

I have:

  • downloaded gene lengths as a numeric vector taken from biomart (Length)
  • A gene.vector created from all of the surveyed genes with 1 or 0 depending on DE (from my output file named DE)
  • A dataframe containing gene ids and the associated GO terms taken from biomart (Named GOT)

My test code:

assayed.genes=DE$assayed.genes
de.genes=DE$de.genes
gene.vector=as.integer(assayed.genes%in%de.genes)
names(gene.vector)=assayed.genes
Length = LEN$genelength
head(gene.vector)

and I see output like

Cre09.g414550.t1.2.v5.5 0

When I try to make the pwf and run goseq

pwf = nullp(gene.vector, bias.data=Length)
go = goseq(pwf, gene2cat = GOT)

The pwf works and produces a plot but when I run goseq I get hit with an infinite recursion error:

Error: evaluation nested too deeply: infinite recursion / options(expressions=)?

Followed by

"Error during wrapup:" repeated

Tweaks and googling haven't turned anything up, so I was hoping someone might be able to spot a glaring error in my approach or offer advice.

rna-seq goseq software error • 411 views
ADD COMMENTlink modified 7 months ago by Ruben20 • written 14 months ago by nsl240
0
gravatar for Ruben
7 months ago by
Ruben20
the Netherlands, Amsterdam, Vrije Universiteit
Ruben20 wrote:

Hi nsl24,

I know you probably have moved on but I had the identical problem and your question was the only one that popped up in my search. So for people in the future struggling with this, here is how I solved this issues for my code.

The solution for me was very simple. I was working with a tibble and forgot about that (from the tibble package). I could either convert it to a named list or to coerce the tibble into a data frame. In either case, probably some genes map to many GO terms and others do not map to anything. So you should have a named list with many duplicate names pointing at various GO terms or a number of duplicate row values next to all their go terms in the other column. This is also pointed out in the package documentation, just something to keep in mind. Your data frame GOT that you used in your example could be altered as follows:

NamedList = GOT$GOterms
names(NamedList) = GOT$GeneIDs
go = goseq(pwf, gene2cat = NamedList)

Cheers, Ruben

ADD COMMENTlink written 7 months ago by Ruben20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1713 users visited in the last hour