DESeq2 count table columns do not normalize
2
0
Entering edit mode
2.2 years ago

Hi everyone,

I am a student working on an RNAseq project and I am experiencing a problem with the DEseq2 count normalization where it seems that 1 or multiple columns in my count table are not normalized ( see example images pre and post normalization). I varied input conditions by taking subsets of the count table, but columns remained broken. For another dataset DEseq2 seemed to work normally and produce normalized tables. I used the same annotation files for both dataset and used FeatureCounts as counting program. I am experiencing the same problem on Galaxy(Galaxy Version 2.11.40.7+galaxy0) and with R (version DESeq2_1.32.0) .

My question: does anyone have an indication why does this could happen? I can provide more information if needed.

pre_normalizationpost_normalization

Best regards,

Patrick

counts DESeq2 normalization • 829 views
ADD COMMENT
2
Entering edit mode
2.2 years ago
nn ▴ 30

It appears that somewhere along the way DESeq2 failes to compute a normalization/size factor for the E2 and I2 libraries. Did you have a look at your samples sizes factors via

sizeFactors(dds)

where dds is your DESeqDataSet object

ADD COMMENT
0
Entering edit mode

Hi nn,

sizeFactors(dds)

NULL

The sizeFactors(dds) value returned is NULL. Also I get this error message when running the DESeqDataSetFromMatrix tool.


Warning message:

In DESeqDataSet(se, design = design, ignoreRank) :

some variables in design formula are characters, converting to factors


I can't find the meaning of this error, but it might be the cause. I think I use the exact same technique as in the DESeq2 manual example in the section: Count Matrix Input (http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#about-the-pasilla-dataset).

condition type

treated1fb treated single-read

treated2fb treated paired-end

treated3fb treated paired-end


see my own coldata dataframe below.

Because I used FeactureCounts i decided create my own count_matrix and worked with the DESeqDataSetFromMatrix() command to create the dds.


dds <- DESeqDataSetFromMatrix(countData = count_matrix, colData = coldata, design= ~ condition)

condition <- c("bead_enriched", "bead_enriched", "Input_control", "Input_control")

type <- c("single_end", "single_end", "single_end", "single_end")

coldata <- data.frame(condition, type)

rownames(coldata) <- c("E1", "E2", "I1", "I2")


my coldata and count_matrix ( sorry didnt know to properly format the tables yet in Biostars, so I used a screenshot)

ADD REPLY
0
Entering edit mode
2.2 years ago

The warning message is a warning, not an error. It's giving you a (mild) complaint about your colData, and it's fixing that for you. That's not the issue.

See what sizeFactors(dds) shows you. The simplest explanation is that the software is setting the size factors of those two samples at exactly 1. Which would be odd, but maybe it's because your counts seem rather low?

Also, DESeq is designed to function on genes, not transcripts. It's not going to handle the fact that lots of transcripts map to a single gene well. There's other software, like DEXSeq I think, which will handle differential exon expression.

ADD COMMENT

Login before adding your answer.

Traffic: 1593 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6