[NOISeq] Error in noiseqbio - filter out low counts
1
0
Entering edit mode
6.4 years ago

Hi,

I'm trying to filter out low counts features of my RNASeq data with noiseqbio function of NOISeq package before I run WGCNA package to construct a co-regulatory network, but I'm getting this error when I try to do that. Can anyone help me to solve this?

# rpkm = matrix with more than 9,000 genes and 7 conditions (2 biological replicates)

F24h_1      F24h_2       C6h_1        ....
e_gw1.1.1022.1 10.6933092  8.91526912  7.24161321   ....
e_gw1.1.104.1   0.0000000  0.02118639  0.02090429       ....
e_gw1.1.1046.1  0.1131807  0.15213278  0.16165381      ....

myfactors=data.frame(condicao=c("F24h","F24h","C6h","C6h","C12h","C12h","C24h","C24h","B6h","B6h","B12h","B12h","B24h","B24h"),replicas= c("F24h_1","F24h_2","C6h_1","C6h_2","C12h_1","C12h_2","C24h_1","C24h_2","B6h_1","B6h_2","B12h_1","B12h_2","B24h_1","B24h_2"))

condicao replicas
1     F24h   F24h_1
2     F24h   F24h_2
3      C6h    C6h_1
4      C6h    C6h_2
5     C12h   C12h_1
6     C12h   C12h_2

mydata<-readData(data=rpkm, factors=myfactors,length = NULL,biotype = NULL,chromosome = NULL,gc = NULL)

mydata

ExpressionSet (storageMode: lockedEnvironment)
assayData: 9852 features, 14 samples
element names: exprs
protocolData: none
phenoData
sampleNames: F24h_1 F24h_2 ... B24h_2 (14
total)
varLabels: condicao replicas
featureData: none
experimentData: use 'experimentData(object)'
Annotation:

mynoiseqbio=noiseqbio(mydata,k=0.5,norm="rpkm",factor=myfactors\$condicao, lc=0, r=50, =1.5, plot=TRUE, a0per=0.9, random.seed=12345,filter=1)

Error in [.data.frame(input@phenoData@data, , factor) :
undefined columns selected

R RNA-Seq low counts RPKM noiseq • 2.7k views
0
Entering edit mode
6.4 years ago
komal.rathi ★ 3.8k

Try this:

mynoiseqbio = noiseqbio(mydata, k = 0.5, norm = "rpkm", factor = "condicao", lc = 0, r = 50, adj =1.5, plot = TRUE, a0per = 0.9, random.seed = 12345, filter = 1)

EDIT: In presence of more than two conditions, you need to specify the conditions you wish to compare, in this case F24h and C6h,

mynoiseqbio = noiseqbio(mydata, k = 0.5, norm = "rpkm", factor = "condicao", conditions = c('F24h','C6h'), lc = 0, r = 50, adj =1.5, plot = TRUE, a0per = 0.9, random.seed = 12345, filter = 1)
0
Entering edit mode

Thanks Komal for your answer, but when I type this, I have another message error:

[1] "Computing Z values..."
Error in allMDbio(input, factor, k = k, norm = norm, conditions = conditions,  :
Error. You must specify which conditions you wish to compare when the factor has two or more conditions.

I have also tried the options below, but I got another error messages.

factor=rpkm[0,c(1:14)]

Error in .subset(x, j) : invalid subscript type 'list'

factor=c("F24h_1","F24h_2")

Error in [.data.frame(input@phenoData@data, , factor) : undefined columns selected

factor=c("F24h_1","C6h_1")

Error in [.data.frame(input@phenoData@data, , factor) : undefined columns selected

So, do you have another suggestion komal? Thank you again.

1
Entering edit mode

I have updated my answer. Like the error says, you need to specify which conditions you want to compare. You can do that in the conditions parameter. It should be "a vector containing the two conditions to be compared by the differential expression algorithm (needed when the factor contains more than 2 different conditions)". As an example, I have specified F24h and C6h as the conditions to be compared.

0
Entering edit mode

Sorry about my inexperience Komal, but it still doesn't work.

[1] "Computing Z values..."
Error in allMDbio(input, factor, k = k, norm = norm, conditions = conditions,  :
The conditions specified don't exist for the factor specified.

So I tried this, but I did not have success.

Error in noiseqbio(mydata, k = 0.5, norm = "rpkm", factor = "replicas",  :
ERROR: To run NOISeqBIO at least to replicates for each condition are needed.
Please, run NOISeq if there are not replicates enough in your experiment.

0
Entering edit mode

Wait, you do have C6h in your conditions, right?

0
Entering edit mode

Right.

condicao replicas
1     F24h   F24h_1
2     F24h   F24h_2
3      C6h    C6h_1
4      C6h    C6h_2
5     C12h   C12h_1
6     C12h   C12h_2

1
Entering edit mode

This is what I did and it is working:

rpkm <- matrix(rnorm(137928),9852,14) # replicate data
colnames(rpkm) <- c("F24h_1","F24h_2","C6h_1","C6h_2","C12h_1","C12h_2","C24h_1","C24h_2","B6h_1","B6h_2","B12h_1","B12h_2","B24h_1","B24h_2")

myfactors <- data.frame(condicao = c("F24h","F24h","C6h","C6h","C12h","C12h","C24h","C24h","B6h","B6h","B12h","B12h","B24h","B24h"),
replicas = c("F24h_1","F24h_2","C6h_1","C6h_2","C12h_1","C12h_2","C24h_1","C24h_2","B6h_1","B6h_2","B12h_1","B12h_2","B24h_1","B24h_2"))

factors = myfactors,
length = NULL,
biotype = NULL,
chromosome = NULL,
gc = NULL)

mynoiseqbio <- noiseqbio(input = mydata, k = 0.5, norm = "rpkm",
factor = "condicao", conditions = c('F24h','C6h'),
lc = 0, r = 50, adj = 1.5, plot = TRUE, a0per = 0.9,
random.seed = 12345, filter = 1)
0
Entering edit mode

Komal, I was reading again the NOISeq tutorial and I was thinking if it's really necessary apply this function because there is filtered.data function too which looks like have the same or similar function than noiseqbio. Have you ever used this function?

filtered.data(dataset, factor, norm = TRUE, depth = NULL, method = 1, cv.cutoff = 100, cpm = 1)

1
Entering edit mode

Umm, I thought your aim was to compute differential expression. There is a difference between the two functions, noiseqbio computes differential expression in addition to filtering out low count features, whereas filtered.data just filters out the low count features. If you just want to filter out low count featues and then move on to some other method for differential expression, then you can use filtered.data function instead of noiseqbio.

0
Entering edit mode

Thank you so much, Komal. Your script code has worked now with me. I really appreciate your answers. I was wondering now if I will have to run this script for each duplicate biological I have to exclude the low counts. If yes, I think the filtered.data function it is more appropriate, don't you agree?

1
Entering edit mode

You could use filtered.data first to remove low count features across all samples, and then use noiseqbio with the argument filter = 0 so that it does not perform any filtering.