Question

Is there a way to get the name or IDs of genes with the goseq table?

0

Entering edit mode

5.3 years ago

unawaz ▴ 60

Hi all,

I've performed a GO enrichment analysis using goseq. The output of goseq() tells you the enriched categories with how many significant genes are in the categories etc, but it doesn't give you the ID of genes that are in any of those categories.

Obviously you can use getgo() to retrieve GOs for genes of interest, but is there a way to get a column of the genes in each of the enriched GO terms using the goseq() function.

So my desired output would be:

category over_represented_pvalue under_represented_pvalue numDEInCat numInCat    term ontology  Ensembl ID
GO:0000786            2.143408e-16                        1         12       43       nucleosome       CC. ENSG00000112655, ENSG00000158483  etc..

With the "Ensembl ID" column is the one I'm looking to add

Any help would be greatly appreciated!

RNA-Seq gene ontology goseq • 5.9k views

ADD COMMENT • link updated 14 months ago by MYX • 0 • written 5.3 years ago by unawaz ▴ 60

score 8 · Accepted Answer · 2018-12-19

8

Entering edit mode

5.3 years ago

darklings ▴ 570

Yes please refer to this post: https://support.bioconductor.org/p/102273/

For my own case, I modified the function

# Get the gene lists of "numDFinCat" in GO.wall report
getGeneLists <- function(pwf, goterms, genome, ids){
  gene2cat <- getgo(rownames(pwf), genome, ids)
  cat2gene <- split(rep(names(gene2cat), sapply(gene2cat, length)),
                    unlist(gene2cat, use.names = FALSE))
  out <- list()
  for(term in goterms){
    tmp <- pwf[cat2gene[[term]],]
    tmp <- rownames(tmp[tmp$DEgenes > 0, ])
    out[[term]] <- tmp
  }
  out
}

This can get a list containing GO terms in my GO.wall report with their associated Ensembl IDs:

goList <- getGeneLists(pwf, GO.wall$category, "hg19", "ensGene")

> head(goList, 1)
$`GO:0140014`
 [1] "ENSG00000040275" "ENSG00000117724" "ENSG00000198901" "ENSG00000156970"
 [5] "ENSG00000186185" "ENSG00000143228" "ENSG00000090889" "ENSG00000125538"
 [9] "ENSG00000158402" "ENSG00000175063" "ENSG00000121152" "ENSG00000169679"

If you want to add the additional column in your report, you can just

 GO.wall$EnsemblID <- sapply(GO.wall$category, function(x) paste0(goList[[x]], collapse = ","))

ADD COMMENT • link 4.3 years ago by darklings ▴ 570

0

Entering edit mode

Exactly what I needed! Thank you!

ADD REPLY • link 5.3 years ago by unawaz ▴ 60

0

Entering edit mode

Hi all, I am trying to adapt this code for my work but could not able to do so. I am working on rice which is a non-native organism for goseq. I am new in using R, can anyone suggest how to modify this code for non-native organism? I have run the goseq smoothly for rice but adding the gene column is tricky business for me. I have category mapping, pwf and GO.wall files.

ADD REPLY • link 4.2 years ago by sanjay • 0

0

Entering edit mode

I am not sure, maybe getgo() doesn't support your genome. Could you provide a bit more details about your species and code/error?

ADD REPLY • link 4.2 years ago by darklings ▴ 570

0

Entering edit mode

Hello there,

I am commenting to ask if you ever found a solution to this problem. I am currently working my way through goseq analysis of a non-native organism myself (nicotiana benthamiana), and my last hurdle is to try and retrieve the differentially expressed genes associated with GO terms that are of interest to us.

Thanks

ADD REPLY • link 3.8 years ago by thomas.welch ▴ 50

0

Entering edit mode

This code defines a function called "getGeneLists" which takes four input parameters:

"pwf" - a data frame containing information about gene expression, "goterms" - a vector of Gene Ontology (GO) terms of interest, "genome" - a character string indicating the genome to use for mapping genes to GO terms, "ids" - a character vector of gene IDs to use for mapping genes to GO terms. The function uses the "getgo" function to map genes in "pwf" to the corresponding GO terms based on the selected genome and gene IDs. It then creates a list of GO terms to genes (i.e., a mapping of GO terms to the genes annotated with them).

The function then loops through each GO term of interest in "goterms", extracts the subset of genes annotated with that term from the "cat2gene" mapping, selects only the rows in "pwf" corresponding to those genes, and returns a list of gene lists for each GO term in "goterms".

In summary, this code is a function for extracting lists of genes associated with specific Gene Ontology terms based on a data frame of gene expression data and a set of predefined GO terms.

ADD REPLY • link 14 months ago by MYX • 0

0

Entering edit mode

my data is also a non-native organism, a gene2go dataframe was generated before I used 'goseq' function. so I didn't use 'getGeneLists' function. only the code below:

goterm is the gene2go dataframe
enriched.GO is the dataframe generated by 'goseq' function

out <- list()
     for(term in enriched.GO$category){
    tmp <- pwf[goterm[goterm$GO %in% term,"GENE"],]
    tmp <- rownames(tmp[tmp$DEgenes > 0, ])
    out[[term]] <- tmp
     }

out

ADD REPLY • link 14 months ago by MYX • 0

0

Entering edit mode

Hello there,

I am commenting to ask if you ever found a solution to this problem. I am currently working my way through goseq analysis of a non-native organism myself (nicotiana benthamiana), and my last hurdle is to try and retrieve the differentially expressed genes associated with GO terms that are of interest to us.

Thanks

ADD REPLY • link 3.8 years ago by thomas.welch ▴ 50