Hi,

I'm trying to filter out low counts features of my RNASeq data with noiseqbio function of NOISeq package before I run WGCNA package to construct a co-regulatory network, but I'm getting this error when I try to do that. Can anyone help me to solve this?

# rpkm = matrix with more than 9,000 genes and 7 conditions (2 biological replicates)

rpkm<-read.csv("rpkm_all.csv")

head(rpkm)

F24h_1 F24h_2 C6h_1 ....

e_gw1.1.1022.1 10.6933092 8.91526912 7.24161321 ....

e_gw1.1.104.1 0.0000000 0.02118639 0.02090429 ....

e_gw1.1.1046.1 0.1131807 0.15213278 0.16165381 ....

myfactors=data.frame(condicao=c("F24h","F24h","C6h","C6h","C12h","C12h","C24h","C24h","B6h","B6h","B12h","B12h","B24h","B24h"),replicas= c("F24h_1","F24h_2","C6h_1","C6h_2","C12h_1","C12h_2","C24h_1","C24h_2","B6h_1","B6h_2","B12h_1","B12h_2","B24h_1","B24h_2"))

head(myfactors)

condicao replicas

1 F24h F24h_1

2 F24h F24h_2

3 C6h C6h_1

4 C6h C6h_2

5 C12h C12h_1

6 C12h C12h_2

mydata<-readData(data=rpkm, factors=myfactors,length = NULL,biotype = NULL,chromosome = NULL,gc = NULL)

mydata

ExpressionSet (storageMode: lockedEnvironment)

assayData: 9852 features, 14 samples

element names: exprs

protocolData: none

phenoData

sampleNames: F24h_1 F24h_2 ... B24h_2 (14

total)

varLabels: condicao replicas

varMetadata: labelDescription

featureData: none

experimentData: use 'experimentData(object)'

Annotation:

mynoiseqbio=noiseqbio(mydata,k=0.5,norm="rpkm",factor=myfactors$condicao, lc=0, r=50, =1.5, plot=TRUE, a0per=0.9, random.seed=12345,filter=1)

Error in `[.data.frame`(input@phenoData@data, , factor) :

undefined columns selected

Thanks Komal for your answer, but when I type this, I have another message error:

mynoiseqbio=noiseqbio(mydata,k=0.5,norm="rpkm",factor="condicao",lc=0,r=50,adj=1.5,plot=TRUE,a0per=0.9,random.seed=12345,filter=1)

[1] "Computing Z values..."

Error in allMDbio(input, factor, k = k, norm = norm, conditions = conditions, :

Error. You must specify which conditions you wish to compare when the factor has two or more conditions.

I have also tried the options below, but I got another error messages.

factor=rpkm[0,c(1:14)]

Error in .subset(x, j) : invalid subscript type 'list'

factor=c("F24h_1","F24h_2")

Error in `[.data.frame`(input@phenoData@data, , factor) : undefined columns selected

factor=c("F24h_1","C6h_1")

Error in `[.data.frame`(input@phenoData@data, , factor) : undefined columns selected

So, do you have another suggestion komal? Thank you again.

I have updated my answer. Like the error says, you need to specify which conditions you want to compare. You can do that in the

conditionsparameter. It should be "a vector containing the two conditions to be compared by the differential expression algorithm (needed when the factor contains more than 2 different conditions)". As an example, I have specified F24h and C6h as the conditions to be compared.Sorry about my inexperience Komal, but it still doesn't work.

mynoiseqbio=noiseqbio(mydata,k=0.5,norm="rpkm",factor="condicao",conditions =c('F24h','C6h'),lc=0,r=50,adj=1.5,plot=TRUE,a0per=0.9,random.seed=12345,filter=1)

[1] "Computing Z values..."

Error in allMDbio(input, factor, k = k, norm = norm, conditions = conditions, :

The conditions specified don't exist for the factor specified.

So I tried this, but I did not have success.

mynoiseqbio=noiseqbio(mydata,k=0.5,norm="rpkm",factor="replicas",conditions =c('F24h_1','F24h_2'),lc=0,r=50,adj=1.5,plot=TRUE,a0per=0.9,random.seed=12345,filter=1)

Error in noiseqbio(mydata, k = 0.5, norm = "rpkm", factor = "replicas", :

ERROR: To run NOISeqBIO at least to replicates for each condition are needed.

Please, run NOISeq if there are not replicates enough in your experiment.

Wait, you do have C6h in your conditions, right?

Right.

head(myfactors)

condicao replicas

1 F24h F24h_1

2 F24h F24h_2

3 C6h C6h_1

4 C6h C6h_2

5 C12h C12h_1

6 C12h C12h_2

This is what I did and it is working:

Komal, I was reading again the NOISeq tutorial and I was thinking if it's really necessary apply this function because there is filtered.data function too which looks like have the same or similar function than noiseqbio. Have you ever used this function?

filtered.data(dataset, factor, norm = TRUE, depth = NULL, method = 1, cv.cutoff = 100, cpm = 1)

Umm, I thought your aim was to compute differential expression. There is a difference between the two functions,

noiseqbiocomputes differential expression in addition to filtering out low count features, whereasfiltered.datajust filters out the low count features. If you just want to filter out low count featues and then move on to some other method for differential expression, then you can use filtered.data function instead of noiseqbio.Thank you so much, Komal. Your script code has worked now with me. I really appreciate your answers. I was wondering now if I will have to run this script for each duplicate biological I have to exclude the low counts. If yes, I think the filtered.data function it is more appropriate, don't you agree?

You could use

filtered.datafirst to remove low count features across all samples, and then usenoiseqbiowith the argumentfilter = 0so that it does not perform any filtering.