Question: [NOISeq] Error in noiseqbio - filter out low counts
0
gravatar for gustavoborin01
5.8 years ago by
University of Campinas, Brazil
gustavoborin0140 wrote:

Hi,

I'm trying to filter out low counts features of my RNASeq data with noiseqbio function of NOISeq package before I run WGCNA package to construct a co-regulatory network, but I'm getting this error when I try to do that. Can anyone help me to solve this?

# rpkm = matrix with more than 9,000 genes and 7 conditions (2 biological replicates)

rpkm<-read.csv("rpkm_all.csv")

head(rpkm)                 

                                     F24h_1      F24h_2       C6h_1        ....
e_gw1.1.1022.1 10.6933092  8.91526912  7.24161321   ....
e_gw1.1.104.1   0.0000000  0.02118639  0.02090429       ....
e_gw1.1.1046.1  0.1131807  0.15213278  0.16165381      ....

myfactors=data.frame(condicao=c("F24h","F24h","C6h","C6h","C12h","C12h","C24h","C24h","B6h","B6h","B12h","B12h","B24h","B24h"),replicas= c("F24h_1","F24h_2","C6h_1","C6h_2","C12h_1","C12h_2","C24h_1","C24h_2","B6h_1","B6h_2","B12h_1","B12h_2","B24h_1","B24h_2"))

head(myfactors)
  condicao replicas
1     F24h   F24h_1
2     F24h   F24h_2
3      C6h    C6h_1
4      C6h    C6h_2
5     C12h   C12h_1
6     C12h   C12h_2

mydata<-readData(data=rpkm, factors=myfactors,length = NULL,biotype = NULL,chromosome = NULL,gc = NULL)

mydata

ExpressionSet (storageMode: lockedEnvironment)
assayData: 9852 features, 14 samples
  element names: exprs
protocolData: none
phenoData
  sampleNames: F24h_1 F24h_2 ... B24h_2 (14
    total)
  varLabels: condicao replicas
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation:

mynoiseqbio=noiseqbio(mydata,k=0.5,norm="rpkm",factor=myfactors$condicao, lc=0, r=50, =1.5, plot=TRUE, a0per=0.9, random.seed=12345,filter=1)

Error in `[.data.frame`(input@phenoData@data, , factor) :
  undefined columns selected

rpkm rna-seq noiseq low counts R • 2.4k views
ADD COMMENTlink modified 5.8 years ago by komal.rathi3.7k • written 5.8 years ago by gustavoborin0140
0
gravatar for komal.rathi
5.8 years ago by
komal.rathi3.7k
Children's Hospital of Philadelphia, Philadelphia, PA
komal.rathi3.7k wrote:

Try this:

mynoiseqbio = noiseqbio(mydata, k = 0.5, norm = "rpkm", factor = "condicao", lc = 0, r = 50, adj =1.5, plot = TRUE, a0per = 0.9, random.seed = 12345, filter = 1)

EDIT: In presence of more than two conditions, you need to specify the conditions you wish to compare, in this case F24h and C6h,

mynoiseqbio = noiseqbio(mydata, k = 0.5, norm = "rpkm", factor = "condicao", conditions = c('F24h','C6h'), lc = 0, r = 50, adj =1.5, plot = TRUE, a0per = 0.9, random.seed = 12345, filter = 1)
ADD COMMENTlink modified 5.7 years ago • written 5.8 years ago by komal.rathi3.7k

Thanks Komal for your answer, but when I type this, I have another message error:

mynoiseqbio=noiseqbio(mydata,k=0.5,norm="rpkm",factor="condicao",lc=0,r=50,adj=1.5,plot=TRUE,a0per=0.9,random.seed=12345,filter=1)
[1] "Computing Z values..."
Error in allMDbio(input, factor, k = k, norm = norm, conditions = conditions,  :
  Error. You must specify which conditions you wish to compare when the factor has two or more conditions.

I have also tried the options below, but I got another error messages.

factor=rpkm[0,c(1:14)]  

Error in .subset(x, j) : invalid subscript type 'list'

factor=c("F24h_1","F24h_2")

Error in `[.data.frame`(input@phenoData@data, , factor) : undefined columns selected

factor=c("F24h_1","C6h_1")

Error in `[.data.frame`(input@phenoData@data, , factor) : undefined columns selected

So, do you have another suggestion komal? Thank you again.

ADD REPLYlink written 5.8 years ago by gustavoborin0140
1

I have updated my answer. Like the error says, you need to specify which conditions you want to compare. You can do that in the conditions parameter. It should be "a vector containing the two conditions to be compared by the differential expression algorithm (needed when the factor contains more than 2 different conditions)". As an example, I have specified F24h and C6h as the conditions to be compared.

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by komal.rathi3.7k

Sorry about my inexperience Komal, but it still doesn't work.

mynoiseqbio=noiseqbio(mydata,k=0.5,norm="rpkm",factor="condicao",conditions =c('F24h','C6h'),lc=0,r=50,adj=1.5,plot=TRUE,a0per=0.9,random.seed=12345,filter=1)
[1] "Computing Z values..."
Error in allMDbio(input, factor, k = k, norm = norm, conditions = conditions,  :
  The conditions specified don't exist for the factor specified.

So I tried this, but I did not have success.

mynoiseqbio=noiseqbio(mydata,k=0.5,norm="rpkm",factor="replicas",conditions =c('F24h_1','F24h_2'),lc=0,r=50,adj=1.5,plot=TRUE,a0per=0.9,random.seed=12345,filter=1)
Error in noiseqbio(mydata, k = 0.5, norm = "rpkm", factor = "replicas",  :
  ERROR: To run NOISeqBIO at least to replicates for each condition are needed.
         Please, run NOISeq if there are not replicates enough in your experiment.

ADD REPLYlink written 5.7 years ago by gustavoborin0140

Wait, you do have C6h in your conditions, right?

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by komal.rathi3.7k

Right. 

head(myfactors)

  condicao replicas
1     F24h   F24h_1
2     F24h   F24h_2
3      C6h    C6h_1
4      C6h    C6h_2
5     C12h   C12h_1
6     C12h   C12h_2

ADD REPLYlink written 5.7 years ago by gustavoborin0140
1

This is what I did and it is working:

rpkm <- matrix(rnorm(137928),9852,14) # replicate data
colnames(rpkm) <- c("F24h_1","F24h_2","C6h_1","C6h_2","C12h_1","C12h_2","C24h_1","C24h_2","B6h_1","B6h_2","B12h_1","B12h_2","B24h_1","B24h_2")

myfactors <- data.frame(condicao = c("F24h","F24h","C6h","C6h","C12h","C12h","C24h","C24h","B6h","B6h","B12h","B12h","B24h","B24h"),
                     replicas = c("F24h_1","F24h_2","C6h_1","C6h_2","C12h_1","C12h_2","C24h_1","C24h_2","B6h_1","B6h_2","B12h_1","B12h_2","B24h_1","B24h_2"))

mydata <- readData(data = rpkm, 
                 factors = myfactors,
                 length = NULL,
                 biotype = NULL,
                 chromosome = NULL,
                 gc = NULL)

mynoiseqbio <- noiseqbio(input = mydata, k = 0.5, norm = "rpkm", 
                      factor = "condicao", conditions = c('F24h','C6h'), 
                      lc = 0, r = 50, adj = 1.5, plot = TRUE, a0per = 0.9, 
                      random.seed = 12345, filter = 1)
ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by komal.rathi3.7k

Komal, I was reading again the NOISeq tutorial and I was thinking if it's really necessary apply this function because there is filtered.data function too which looks like have the same or similar function than noiseqbio. Have you ever used this function?

filtered.data(dataset, factor, norm = TRUE, depth = NULL, method = 1, cv.cutoff = 100, cpm = 1) 

ADD REPLYlink written 5.7 years ago by gustavoborin0140
1

Umm, I thought your aim was to compute differential expression. There is a difference between the two functions, noiseqbio computes differential expression in addition to filtering out low count features, whereas filtered.data just filters out the low count features. If you just want to filter out low count featues and then move on to some other method for differential expression, then you can use filtered.data function instead of noiseqbio.

ADD REPLYlink written 5.7 years ago by komal.rathi3.7k

Thank you so much, Komal. Your script code has worked now with me. I really appreciate your answers. I was wondering now if I will have to run this script for each duplicate biological I have to exclude the low counts. If yes, I think the filtered.data function it is more appropriate, don't you agree? 

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by gustavoborin0140
1

You could use filtered.data first to remove low count features across all samples, and then use noiseqbio with the argument filter = 0 so that it does not perform any filtering.

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by komal.rathi3.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2193 users visited in the last hour