Question: Hypergeometric Test R/Bioconductor
1
gravatar for Thaman
9.4 years ago by
Thaman3.2k
Finland
Thaman3.2k wrote:

Hi,

I am getting interested in R/Bioconductor packages and trying to learn about it. I want to perform HyperGeometric test for over representation against GO and KEGG. I have go two text files: Back.txt and genes.txt. To test HyperGeometric test I wrote following code in R. The result with be in data.frame and visualizing in the Gograph or KEGG pathway.

library(topGO)

library(GOstats)

universe=read.table("Back.txt", sep=",")  # Background files where only entrez id's are listed without heading column

tbl <- read.table ("genes.txt", sep=",")  # selected genes with following header Probes_id,entrez_gene_id,symbols,P.Value and F.C

selected=<-tbl$V2  # Selecting only second column of tbl vector where entrez_gene_id is present

param <- new ("GOHyperGParams", geneIds = selected, 

universeGeneIds=universe, annotation="org.Hs.eg.db", 

ontology="BP",pvalueCutoff=0.1, conditional=FALSE,testDirection="over")

But, I couldn't succeeed because I get the error

Error in makeValidParams(.Object) : 

geneIds and universeGeneIds must have the same mode

geneIds: NULL 

universeGeneIds: integerFALSE

In addition: Warning message:

In makeValidParams(.Object) :

converting univ from list to atomic vector via unlist

hyp <- hyperGTest (param)

Error in is(object, Cl) : 

error in evaluating the argument 'p' in selecting a method for function 'hyperGTest'

Am I missing something here? Do I have to go through more resources to clear my understanding? if yes where can I find R/Bioconductor HyperGeometric test with all needed R packages?

Plus I have loaded all the library and packages, shown in the link ( http://pastebin.com/i735EUWp )

Thank you

R bioconductor • 6.5k views
ADD COMMENTlink modified 6.7 years ago by Biostar ♦♦ 20 • written 9.4 years ago by Thaman3.2k
1

just a comment, trying to read a file in R using readLines while there a functions like read.delim, read.table, (more robust and flexible) or scan (more efficent) is almost always a bad idea. Also, please provide example input files or put the files online.

ADD REPLYlink written 9.4 years ago by Michael Dondrup47k

I think also that an example file would be necessary.

ADD REPLYlink written 9.4 years ago by D. Puthier320
3
gravatar for Brad Chapman
9.4 years ago by
Brad Chapman9.5k
Boston, MA
Brad Chapman9.5k wrote:

The error message indicates that there are no Entrez IDs in your selected set which are also in the universe set.

What does 'genes.txt' and 'selected' look like? From your comment, it appears as if genes.txt is a CSV file where the Entrez gene IDs are the second column. If so, extract only the entrez IDs:

tbl <- read.table("genes.txt", sep=",")
selected <- tbl$V2

Additionally, are all of the entrez IDs in selected also in universe? Double check that:

lapply(selected, function(x) x %in% universe)

gives a list of TRUE values.

ADD COMMENTlink modified 6 months ago by RamRS26k • written 9.4 years ago by Brad Chapman9.5k

@ Brad, sorry files content tabulation given by me was confusing. Acutally my Universe (back.txt) contain only Entrez_id no heading. And Selected genes (genes.txt) contains Probes_id,entrez_gene_id,symbols,P.Value and F.C header separated by tab. Yes entrez_gene_id is in second column like you said. I try to do check again but it generate empty list().

ADD REPLYlink written 9.4 years ago by Thaman3.2k

If they are separated by tabs, then use 'sep="t"' like in D. Puthier's answer instead of 'sep=","'. Otherwise the columns will not get split correctly and you'll have only one column; that's why tbl$V2 (the second column) is NULL.

ADD REPLYlink written 9.4 years ago by Brad Chapman9.5k

I have done enrichment analysis but not sure whether result is produced as needed but I will do check with DAVID for reference. I want to modify summary result produce my hyperGTest into my own data.frame as G0_term_id/KEGG,GO_term_name/KEGG, Pvalue and number of associated genes from my (genes.txt) file.

ADD REPLYlink written 9.4 years ago by Thaman3.2k
2
gravatar for D. Puthier
9.4 years ago by
D. Puthier320
France/Marseille/Inserm
D. Puthier320 wrote:

The format of your files is not very clear. If these are tabulated files you should use the read.table function as the selected object is an instance of class vector and should not if you have multiple columns (Expressed genes with Probes_id,entrez_gene_id,symbols,P.Value and F.C). So in a first attempt you should try something like:

selected=read.tables("genes.txt",head=T, sep="\t", quote="") # head=F if no header 
# set sep to the right character
# Your  Probes_ids should be in the first column of "selected"
selected[,1]
ADD COMMENTlink modified 6 months ago by RamRS26k • written 9.4 years ago by D. Puthier320

I have done enrichment analysis but not sure whether result is produced as needed but I will do check with DAVID for reference. I want to modify summary result produce my hyperGTest into my own data.frame as G0_term_id/KEGG,GO_term_name/KEGG, Pvalue and number of associated genes from my (genes.txt) file

ADD REPLYlink written 9.4 years ago by Thaman3.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1935 users visited in the last hour