Error massage when instantiating DEseq Dataset
0
0
Entering edit mode
3.3 years ago
Jakpa ▴ 50

Hello Everyone, I am trying to perform differential expression. I started by importing the featurecounts data down to removing unwanted columns and performing the Matrix. After running the code for the DESeqDataSet, I got an error massage: "Warning message: In DESeqDataSet(se, design = design, ignoreRank) : all genes have equal values for all samples. will not be able to perform differential analysis"

I a kind of ignored it because I could not really understood the meaning. I went ahead to do sizefactors and i got a value of 1 for all the samples. Below is a view of what i did:

library(DESeq2)
  library(Biobase)
#load the data file from featureCounts.txt
countData<- read.table("/home/mlsi/RNASeq/countTable/featureCounts.txt", head=T,row.names = 1)

#delete column 1-5
deleteColumnCountdata<- countData[-c(1,2,3,4:5)]

colnames(deleteColumnCountdata)
# romove .bam or .sam from filename
colnames(deleteColumnCountdata) <- gsub ("\\X.home.mlsi.RNASeq.mapping.","",colnames(deleteColumnCountdata))  
colnames(deleteColumnCountdata) <- gsub ("\\.UHR_[123].bam","",colnames(deleteColumnCountdata))
colnames(deleteColumnCountdata) <- gsub ("\\.HBR_[123].bam","",colnames(deleteColumnCountdata))
colnames(deleteColumnCountdata)
class(deleteColumnCountdata)
head(deleteColumnCountdata)

# convert 'deleColumnCountdata' to matrix
newCountsData<-as.matrix(deleteColumnCountdata)
head(newCountsData)

group<- factor(c(rep("UHR",3), rep("HBR",3)))
con<- factor(c(rep("cancer",3), rep("normal",3)))

# contruct a data frame
countDataDataFrame<- data.frame(row.names = colnames(newCountsData), group , con)
head(countDataDataFrame)
#instantiate the DESeq dataset
dds<- DESeqDataSetFromMatrix(countData =newCountsData, colData =countDataDataFrame, design = ~ con)

"Warning message:
In DESeqDataSet(se, design = design, ignoreRank) :
  all genes have equal values for all samples. will not be able to perform differential analysis"

dds<-estimateSizeFactors(dds)
sizeFactors(dds)
HBR_! HBr_2 HBR_3 UHR_1 UHR_2 UHR_3    all the samples have 1

is this correct? because I read that sizefactors are almost 1

then, I went ahead to do Pre-filtering: dds<- dds [rowSums(counts(dds)) > 1, ]

I tried rlogTransformation because I want to have a heatmap/clustering of the data:

rld<- rlogTransformation(dds)

I got this erroe:

Error in estimateDispersionsFit(object, fitType, quiet = TRUE) :
  all gene-wise dispersion estimates are within 2 orders of magnitude
  from the minimum value, and so the standard curve fitting techniques will not work.
  One can instead use the gene-wise estimates as final estimates:
  dds <- estimateDispersionsGeneEst(dds)
  dispersions(dds) <- mcols(dds)$dispGeneEst
  ...then continue with testing using nbinomWaldTest or nbinomLRT
I also tried rlog, log but same error. I tried to follow the suggestion outlined in the error massage, but I dont think I will achieve my goal at the end.
I even tried DESeq(dds);  i got this error: estimating size factors
estimating dispersions
Error in .local(object, ...) :
  all genes have equal values for all samples. will not be able to perform differential analysis

Where am I getting wrong?

I will appreciate solutions and suggestions.

regards, Anthony

differential Expression sizefactors DESeq R • 1.6k views
ADD COMMENT
0
Entering edit mode

What is the output of head(countData, 10)?

all genes have equal values for all samples

This error suggests that all values (counts) are the same all across the count matrix, this needs to be investigated before going any further.

There is a markdown code option (the 10101 button), please use it from now on to highlight code.

ADD REPLY
0
Entering edit mode
enter code here

  >head(countData)
                                  Chr
U2                                 22
FRG1FP     22;22;22;22;22;22;22;22;22
CU104787.1                      22;22
BAGE5                           22;22
ACTR3BP6                        22;22
5_8S_rRNA                          22
                                                                                      Start
U2                                                                                 10736171
FRG1FP     10939388;10940597;10941691;10944967;10947304;10949212;10950049;10959067;10961283
CU104787.1                                                                11065974;11067335
BAGE5                                                                     11066501;11067985
ACTR3BP6                                                                  11124337;11124508
5_8S_rRNA                                                                          11249809
                                                                                        End
U2                                                                                 10736283
FRG1FP     10939423;10940707;10941780;10945053;10947418;10949269;10950174;10959136;10961338
CU104787.1                                                                11066015;11067346
BAGE5                                                                     11066515;11068089
ACTR3BP6                                                                  11124379;11125705
5_8S_rRNA                                                                          11249959
                      Strand Length X.home.mlsi.RNASeq.mapping.HBR_1.HBR_1.bam
U2                         -    113                                          0
FRG1FP     -;-;-;-;-;-;-;-;-    749                                          0
CU104787.1               -;-     54                                          0
BAGE5                    +;+    120                                          0
ACTR3BP6                 +;+   1241                                          0
5_8S_rRNA                  -    151                                          0
           X.home.mlsi.RNASeq.mapping.HBR_2.HBR_2.bam X.home.mlsi.RNASeq.mapping.HBR_3.HBR_3.bam
U2                                                  0                                          0
FRG1FP                                              0                                          0
CU104787.1                                          0                                          0
BAGE5                                               0                                          0
ACTR3BP6                                            0                                          0
5_8S_rRNA                                           0                                          0
           X.home.mlsi.RNASeq.mapping.UHR_1.UHR_1.bam X.home.mlsi.RNASeq.mapping.UHR_2.UHR_2.bam
U2                                                  0                                          0
FRG1FP                                              0                                          0
CU104787.1                                          0                                          0
BAGE5                                               0                                          0
ACTR3BP6                                            0                                          0
5_8S_rRNA                                           0                                          0
           X.home.mlsi.RNASeq.mapping.UHR_3.UHR_3.bam
U2                                                  0
FRG1FP                                              0
CU104787.1                                          0
BAGE5                                               0
ACTR3BP6                                            0
5_8S_rRNA                                           0enter code here

ADD REPLY
0
Entering edit mode

@ ATpoint, Thanks for the response. Thats the output of head(countData)

ADD REPLY
0
Entering edit mode

Looks like featureCounts output. Now you have to find out whether all entries are indeed zero and if so why. Maybe the GTF you used is different from the reference genome you aligned against in terms of chromosome names?

ADD REPLY
0
Entering edit mode

@ ATpoint, i ran the below command on ubuntu to see the last 10 reads and as you can see all the entries are not zero

enter code here
enter code here

ase) mlsi@ubuntu:~/RNASeq/countTable$ tail featureCounts.txt
ERCC-00157 ERCC-00157 1 1019 + 1019 9 9 9 9 9 9
ERCC-00158 ERCC-00158 1 1027 + 1027 0 0 0 0 0 0
ERCC-00160 ERCC-00160 1 743 + 743 7 7 7 7 7 7
ERCC-00162 ERCC-00162 1 523 + 523 26 26 26 26 26 26
ERCC-00163 ERCC-00163 1 543 + 543 4 4 4 4 4 4
ERCC-00164 ERCC-00164 1 1022 + 1022 0 0 0 0 0 0
ERCC-00165 ERCC-00165 1 872 + 872 32 32 32 32 32 32
ERCC-00168 ERCC-00168 1 1024 + 1024 2 2 2 2 2 2
ERCC-00170 ERCC-00170 1 1023 + 1023 1 1 1 1 1 1
ERCC-00171 ERCC-00171 1 505 + 505 864 864 864 864 864 864

ADD REPLY
0
Entering edit mode

Also, after deleting the unwanted columns and converting to matrix, this the output:

> tail(newCountsData)
           HBR_1 HBR_2 HBR_3 UHR_1 UHR_2 UHR_3
ERCC-00163     4     4     4     4     4     4
ERCC-00164     0     0     0     0     0     0
ERCC-00165    32    32    32    32    32    32
ERCC-00168     2     2     2     2     2     2
ERCC-00170     1     1     1     1     1     1
ERCC-00171   864   864   864   864   864   864

ADD REPLY
0
Entering edit mode

If you want to check if there are any genes that don't have a count of 0 across all samples.

library("tidyverse")

countData %>%
  rownames_to_column("gene") %>%
  rowwise %>%
  filter(any(c_across(ends_with(".bam")) != 0))
ADD REPLY
0
Entering edit mode

Hi rpolicastro, Thanks for your response. I tried to install the package "tidyverse" but the installation failed. I tried severally it failed to install. I was thinking I have old version of R maybe thats was the reason the installation failed but I have the new version. However, I was able to read the last 10 lines from ubuntu command line as you can see above

ADD REPLY

Login before adding your answer.

Traffic: 2680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6