Question: DESeq2 analysis: huge number of outliers and refitting
3
gravatar for VHahaut
2.2 years ago by
VHahaut940
Belgium
VHahaut940 wrote:

Hello!

I am quite new to bioinformatic so I hope my question will be clear enough.

I am trying to run a DESeq2 analysis on 25 bovine tumoral samples. Among them I have two technical replicates of my unique control (I know is not ideal) and most of my "treated" samples have one technical replicate too. Before any DESeq analysis I had to drop a few samples because the quality of the RNA-seq was not good enough.

design = ~Group

Overview of colData

row.names sample Group
sample1 sample1 treated
sample2 sample1 treated
sample3 sample2 control
sample4 sample2 control

I tried two different approaches: Either start the DESeq analysis without specifying that I had technical replicates (dds) or using the collapseReplicates function based on the colData sample column to merge the reads (ddsCollapsed).

dds <- DESeqDataSetFromMatrix(matrix, colData, design)

ddsCollapsed<- collapseReplicates(dds, groupby= colData(dds)$sample, renameCols=T)

My problem lies in the DESeq analysis:

DESeq(dds)

estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 6787 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions
fitting model and testing

 

DESeq(ddsCollapsed)

estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 24224 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions
fitting model and testing

I am working with bovine ENSEMBL annotation which contains ~24660 entries...

I was really surprised by the number of outliers. Moreover, the MA plots from those two analysis are really not great (I join to this post the one of ddsCollapsed):

MA plot DESeq

I have already red the supplementary data about Cook's distance.

So my questions are the following:

1) Do I have to worry about such high number of outliers? Is it common? What could be the reasons leading to those numbers?

2) If yes to (1), what can I do to overcome this trouble? 

3) A unrelated question: Is it possible to put missing values (NA) in the colData table? I tried and got this error:

Error in t(hatmatrix %*% t(y)) : 

"error in evaluating the argument 'x' in selecting a method for function 't': Error in hatmatrix %*% t(y) : non-conformable arguments"

 


Thanks for reading this long post! Any advice would be appreciated! Vincent

rna-seq deseq R • 2.4k views
ADD COMMENTlink modified 2.2 years ago by Michael Love1.4k • written 2.2 years ago by VHahaut940
3
gravatar for Michael Love
2.2 years ago by
Michael Love1.4k
United States
Michael Love1.4k wrote:

The count outlier flagging is useful when there are a minority of outliers in the dataset, but as you have noted, something else is going on here with so many genes flagged. There are two reasons for so many genes being flagged as outlier: either the method for flagging outliers is not appropriate for the distribution of counts in your data and should be turned off (by setting minReplicatesForReplace=Inf and cooksCutoff=FALSE), or you have a sample which is a count outlier in almost every gene (which could be found using plotPCA as in the vignette). My recommendation if you don't find an obvious outlier sample which is contributing to most of these filtered genes, then turn off the filtering and inspect the top genes using plotCounts.

No, you can't include NA in the columns which are used for modeling. We need complete covariate information.

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Michael Love1.4k

Thanks it was exactly the answers I needed! 

ADD REPLYlink written 2.2 years ago by VHahaut940
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 648 users visited in the last hour