Question: Goseq: Pwf: Error In If (Hi <= Low) { : Missing Value Where True/False Needed
0
gravatar for liupfskygre
5.2 years ago by
liupfskygre190
United States
liupfskygre190 wrote:

Hi, all

Dose anyone familiar with the use of goseq package here, I always get quick reply here, so I post my questions here, hope for some suggestions.

I am trying to use goseq for RNAseq data analysis of a non-native bacteria, I fetch the genelength data and GO term data by using ensembl Perl API. Differential expression was done by DESeq2.

After data run pwf of goseq, error occurred as following:

###

Error in if (hi <= low) { : missing value where TRUE/FALSE needed

It seems to be the problem of the names or ID problem, but I could not solved it till now.

Have any one met this before? How could I solved this problem?

Thanks!

My workflow were as following

##   import the genes ID which I fetch from Ensembl

HZ254gene_data <- read.delim("E:/HZ_GO_enrichment/HZ254gene_data.txt", quote="")

colnames(HZ254gene_data)[1]="locus_tag"

##  get assayed gene vector

assayed.genes=HZ254gene_data$locus_tag

assayed.genes1=as.vector(assayed.genes)

is.vector(assayed.genes1)

[1] TRUE

head (assayed.genes1)

[1] "Mtc_0001" "Mtc_0002" "Mtc_0003" "Mtc_0004" "Mtc_0005" "Mtc_0006"

str(assayed.genes1)

chr [1:2512] "Mtc_0001" "Mtc_0002" "Mtc_0003" "Mtc_0004" "Mtc_0005" ...


## get gene length data for pwf bias.data

length=HZ254gene_data$length

is.vector(length)

[1] TRUE

names(length)=assayed.genes1

head(length)

Mtc_0001 Mtc_0002 Mtc_0003 Mtc_0004 Mtc_0005 Mtc_0006 
     768      654     1899     1770      351      654

#
str(length)

 Named int [1:2512] 768 654 1899 1770 351 654 903 72 966 1071 ...
 - attr(*, "names")= chr [1:2512] "Mtc_0001" "Mtc_0002" "Mtc_0003" "Mtc_0004" ...



##import results from DESeq2 and get the DEgenes 

HZ254DifList_Mer <- read.delim("E:/HZ_GO_enrichment/HZ254DifList_Mer.txt")

de.genes1=as.integer(HZ254DifList_Mer$padj<0.05)

##get the genes for pwf 

gene.vector=as.integer(de.genes1)

names(gene.vector)=assayed.genes1

head(gene.vector)

##import the GO_gene association table from ensembl API 

HZ254gene_go <- read.delim("E:/HZ_GO_enrichment/HZ254gene_go.txt", quote="")

go.terms=HZ254gene_go


## run pwf, 

pwf=nullp(gene.vector,bias.data=length)

##got errors

Error in if (hi <= low) { : missing value where TRUE/FALSE needed

sessionInfo()

R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936   
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C                                                   
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] goseq_1.14.0            geneLenDataBase_0.99.12 BiasedUrn_1.06.1       
[4] BiocInstaller_1.12.0   

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.24.0   Biobase_2.22.0         BiocGenerics_0.8.0    
 [4] biomaRt_2.18.0         Biostrings_2.30.1      bitops_1.0-6          
 [7] BSgenome_1.30.0        DBI_0.2-7              GenomicFeatures_1.14.2
[10] GenomicRanges_1.14.4   grid_3.0.2             IRanges_1.20.6        
[13] lattice_0.20-24        Matrix_1.1-2           mgcv_1.7-28           
[16] nlme_3.1-113           parallel_3.0.2         RCurl_1.95-4.1        
[19] Rsamtools_1.14.3       RSQLite_0.11.4         rtracklayer_1.22.3    
[22] stats4_3.0.2           tools_3.0.2            XML_3.98-1.1          
[25] XVector_0.2.0          zlibbioc_1.8.0
R format data • 12k views
ADD COMMENTlink modified 19 months ago by IrK20 • written 5.2 years ago by liupfskygre190
1

Hi, liupfskygre, It's hard to know exactly what the issue is without a reproducible example, but perhaps the assignment of values to variable names, which are also base R functions, could be a problem. Try replacing length=HZ254gene_data$length with lgth=HZ254gene_data$length. If the nullp function calls length (the function), your use of this name as a variable might be a conflict.

ADD REPLYlink modified 5.2 years ago • written 5.2 years ago by KKeenan0270

thanks for your reminding! I recheck the data and found where the problems is, there were some genes in the dataset that do not have expression data, which were assigned as NA!

ADD REPLYlink written 5.2 years ago by liupfskygre190
0
gravatar for Michael Dondrup
5.1 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

To not leave this question appear unanswered:

The first thing you have do after seeing such an error is to call traceback() and add the output to the error report.

Error in if (hi <= low) { : missing value where TRUE/FALSE needed

means, as it says, there is a missing value where if requires a TRUE/FALSE value, <= and other comparison operators in R yield 'NA' if and only if at least one of the values is missing itself. In conclusion, your input data contains missing values which goseq does not handle.

In case that happens, missing values need to be removed (e.g. by using complete.cases) or imputed, see http://www.statmethods.net/input/missingdata.html for a quick tutorial.

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Michael Dondrup46k

Thanks for your explanations!

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by liupfskygre190
0
gravatar for Jeremy Leipzig
5.1 years ago by
Philadelphia, PA
Jeremy Leipzig18k wrote:
de.genes1=as.integer(HZ254DifList_Mer$padj<0.05)`

This looks like a logical vector casted into an integer - can you print de.genes1 out

ADD COMMENTlink written 5.1 years ago by Jeremy Leipzig18k

thanks, as for my case, missing data "NA" in the de.genes1 caused the problem! At first I tried to convert de.gene1 into numeric, but failed, recheck the data found NA existed.

ADD REPLYlink written 5.1 years ago by liupfskygre190
0
gravatar for keith.hughitt
4.9 years ago by
keith.hughitt250
United States
keith.hughitt250 wrote:

In case anyone else runs into this problem, one way you can end up with the same error is if the gene lengths are all the same.

ADD COMMENTlink modified 18 months ago • written 4.9 years ago by keith.hughitt250
0
gravatar for IrK
19 months ago by
IrK20
Australia
IrK20 wrote:

I had the same error message and in my case it a binary vector had NAs, so basically:

> genes = as.integer(DESeqoutput$padj <= 0.05 & > !is.na(DESeqoutput$padj))   # Give all values less than 0.05 and no NAs

However, when I call for nullp() I get following warning:

Can't find mm10/ensGene length data in genLenDataBase...
Found the annotaion package, TxDb.Mmusculus.UCSC.mm10.ensGene
Trying to get the gene lengths from it.
Warning messages:
1: In library() : library ‘/usr/lib/R/site-library’ contains no packages
2: In pcls(G) : initial point very close to some inequality constraints

It is warning message but I would like to interpret it. What does it mean? Thanks

ADD COMMENTlink written 19 months ago by IrK20
1

that R is gibberish. What do you think the gene vector should look like?

ADD REPLYlink written 19 months ago by Jeremy Leipzig18k

To be honest, I checked that NAs are absent in gene IDs vector, but forgot that numeric values also can be represented as NAs in DESeqoutput$padj vector.

ADD REPLYlink written 19 months ago by IrK20

Show us what genes looks like after your faulty code above.

ADD REPLYlink written 19 months ago by Jeremy Leipzig18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 995 users visited in the last hour