Question: NA values of count matrix in class DESeqDataSet
0
gravatar for maleknias
2.7 years ago by
maleknias40
maleknias40 wrote:

Dear all

Hi

I downloaded a data set in class "RangedSummarizedExperiment" from "https://jhubiostatistics.shinyapps.io/recount/". I want to find differential expression genes. My code is :

load("~/Downloads/rse_gene.Rdata")

class(rse_gene)

**[1] "RangedSummarizedExperiment"

attr(,"package")

[1] "SummarizedExperiment"**

data=colData(rse_gene)

names= names(colData(rse_gene))

write.table(data,file="colData.csv", col.names=names,sep="\t",row.names=FALSE)

data1=fread("~/Downloads/colData.txt")

colData(rse_gene) =DataFrame(data1)

colData(rse_gene)$disease.status = as.factor(colData(rse_gene)$disease.status)

dds <- DESeqDataSet(rse_gene, design = ~ disease.status)

**converting counts to integer mode Error in validObject(.Object) :

invalid class “DESeqDataSet” object: NA values are not allowed in the count matrix In addition: Warning message: In mde(x) : NAs introduced by coercion to integer range**

I use two solution for this problem but both of them were useless:

1- Keep only rows with non-zero counts:

rse_gene <- rse_gene[rowSums(assay(rse_gene)) != 0, ]

2- Replace the NA value by -9 :

countdata <- assay(rse_gene)

replace(countdata,countdata==0,-9)

coldata <- colData(rse_gene)

ddsMat <- DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~ disease.status)

I will be appreciate if any one can help me!!

rna-seq recount • 4.1k views
ADD COMMENTlink modified 2.6 years ago • written 2.7 years ago by maleknias40

Try this:

library(DESeq2)
load("rse_gene.Rdata")
ddsse=DESeqDataSet(rse_gene,design=~disease.status)

Btw, what is the accession number of the data?

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by cpad011213k
0
gravatar for Kevin Blighe
2.7 years ago by
Kevin Blighe59k
Kevin Blighe59k wrote:

Your command using rowSums won't work. What I normally do is something like this:

test <- data.frame(c(1,2,3,4,5), c(1,2,3,4,5), c(1,NA,3,4,5), c(NA,NA,NA,4,5), c(1,2,3,4,5))
colnames(test) <- c("a","b","c","d","e")
test
a b  c  d e
1 1  1 NA 1
2 2 NA NA 2
3 3  3 NA 3
4 4  4  4 4
5 5  5  5 5

test[apply(test, 1, function(x) sum( is.na(x) ))==0,]
a b c d e
4 4 4 4 4
5 5 5 5 5

I originally used this way of filtering for removing transcripts that had zero counts across 5 or more samples, using something like this:

apply(test, 1, function(x) sum(x==0))<5
ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Kevin Blighe59k
1

Instead of

test[apply(test, 1, function(x) sum( is.na(x) ))==0,]

you could use the built-in function complete.cases, e.g

test[complete.cases(test), ]
ADD REPLYlink written 2.7 years ago by e.rempel810

Thanks e.rempel - I knew that the function existed but could not remember the name at the time of writing!

ADD REPLYlink written 2.7 years ago by Kevin Blighe59k
0
gravatar for maleknias
2.6 years ago by
maleknias40
maleknias40 wrote:

Dear all

I used all your guide but unfortunately all of them have the last same error !!

first:

assay(rse_gene)=assay(rse_gene)[apply(assay(rse_gene), 1, function(x) sum( is.na(x) ))==0,]

dds <- DESeqDataSet(rse_gene, design = ~ disease.status)

second:

assay(rse_gene)=assay(rse_gene)[complete.cases(assay(rse_gene)),]

dds <- DESeqDataSet(rse_gene, design = ~ disease.status)

I will be appreciate if any one can help me!!

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by maleknias40

Try to remove the NA values before you run that function. I'm not sure that the code you have above is going to behave in the way that you expect.

I don't know why you need to use the assay() function before DESeqDataSet()

Can you not just remove NA values from your raw counts matrix before you do anything with DESeq?

ADD REPLYlink written 2.6 years ago by Kevin Blighe59k
> rse_gene <- data.frame(c(1,2,3,4,5), c(1,2,3,4,5), c(1,NA,3,4,5), c(NA,NA,NA,4,5), c(1,2,3,4,5))
> colnames(rse_gene) <- c("a","b","c","d","e")
> rse_gene[is.na(rse_gene))] <- 0
> rse_gene
  a b c d e
1 1 1 1 0 1
2 2 2 0 0 2
3 3 3 3 0 3
4 4 4 4 4 4
5 5 5 5 5 5
> dds <- DESeqDataSet(rse_gene, design = ~ disease.status)
ADD REPLYlink written 2.6 years ago by ioannis30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2274 users visited in the last hour