Entering edit mode
                    3.1 years ago
        ta_awwad
        
    
        ▴
    
    350
    Hi everyone,
first, I am not expert in using DEXseq. I am trying to analyze DEU from two different datasets. I am just worried about the batch effect. I tried to use svaseq function as suggested by others to remove this effect but ended up with an error message:
> sampleAnno
file condition libType type
1 ES.1.txt ESC paired-end 1
2 ES.2.txt ESC paired-end 1
3 LV_rep1.txt LV paired-end 2
4 LV_rep2.txt LV paired-end 2
> dxd = DEXSeqDataSetFromHTSeq(
  countfiles = sampleAnno$file, 
  sampleData = sampleAnno,
  design= ~ sample + exon + condition:exon,
  flattenedfile = gffFile)
> dxd = estimateSizeFactors( dxd )
> dxd <- dxd[rowMeans(counts(dxd))>1,]
> dat <- counts(dxd, normalized=TRUE)
> library(sva)
> mod1 <- model.matrix(~ sample + exon , colData(dxd))
> mod0 <- model.matrix(~ 1, colData(dxd))
> svseq <- svaseq(dat, mod1, mod0, n.sv=2, B=5)
> dxdsva = dxd
> dxdsva$SV1=svseq$sv[,1]
> dxdsva$SV2=svseq$sv[,2]
> design(dxdsva) <- ~ SV1 + SV2 + sample + exon + condition:exon
> dxdsva <- estimateDispersions( dxdsva )
Error: BiocParallel errors
  1 remote errors, element index: 1
  0 unevaluated and other errors
  first remote error:
Error in estimateDispersionsGeneEst(x, maxit = maxit, quiet = quiet, modelMatrix = modelMatrix, : the number of samples and the number of model coefficients are equal,
  i.e., there are no replicates to estimate the dispersion.
  use an alternate design formula
sessionInfo( )
R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.2.1
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
 [1] sva_3.44.0                  genefilter_1.78.0           mgcv_1.8-40                 nlme_3.1-159               
 [5] forcats_0.5.2               stringr_1.4.1               dplyr_1.0.10                purrr_0.3.5                
 [9] readr_2.1.3                 tidyr_1.2.1                 tibble_3.1.8                ggplot2_3.3.6              
[13] tidyverse_1.3.2             DEXSeq_1.42.0               RColorBrewer_1.1-3          AnnotationDbi_1.58.0       
[17] DESeq2_1.36.0               SummarizedExperiment_1.26.1 GenomicRanges_1.48.0        GenomeInfoDb_1.32.4        
[21] IRanges_2.30.1              S4Vectors_0.34.0            MatrixGenerics_1.8.1        matrixStats_0.62.0         
[25] Biobase_2.56.0              BiocGenerics_0.42.0         BiocParallel_1.30.3        
loaded via a namespace (and not attached):
 [1] fs_1.5.2               bitops_1.0-7           lubridate_1.8.0        bit64_4.0.5            filelock_1.0.2        
 [6] progress_1.2.2         httr_1.4.4             tools_4.2.0            backports_1.4.1        utf8_1.2.2            
[11] R6_2.5.1               DBI_1.1.3              colorspace_2.0-3       withr_2.5.0            tidyselect_1.1.2      
[16] prettyunits_1.1.1      bit_4.0.4              curl_4.3.3             compiler_4.2.0         rvest_1.0.3           
[21] cli_3.4.1              xml2_1.3.3             DelayedArray_0.22.0    scales_1.2.1           rappdirs_0.3.3        
[26] digest_0.6.29          Rsamtools_2.12.0       XVector_0.36.0         pkgconfig_2.0.3        limma_3.52.4          
[31] dbplyr_2.2.1           fastmap_1.1.0          readxl_1.4.1           rlang_1.0.6            rstudioapi_0.14       
[36] RSQLite_2.2.18         generics_0.1.3         jsonlite_1.8.2         hwriter_1.3.2.1        vroom_1.6.0           
[41] googlesheets4_1.0.1    RCurl_1.98-1.9         magrittr_2.0.3         GenomeInfoDbData_1.2.8 Matrix_1.5-1          
[46] Rcpp_1.0.9             munsell_0.5.0          fansi_1.0.3            lifecycle_1.0.3        edgeR_3.38.4          
[51] stringi_1.7.8          zlibbioc_1.42.0        BiocFileCache_2.4.0    grid_4.2.0             blob_1.2.3            
[56] parallel_4.2.0         crayon_1.5.2           lattice_0.20-45        haven_2.5.1            Biostrings_2.64.1     
[61] splines_4.2.0          annotate_1.74.0        hms_1.1.2              KEGGREST_1.36.3        locfit_1.5-9.6        
[66] pillar_1.8.1           geneplotter_1.74.0     codetools_0.2-18       biomaRt_2.52.0         reprex_2.0.2          
[71] XML_3.99-0.11          glue_1.6.2             modelr_0.1.9           tzdb_0.3.0             png_0.1-7             
[76] vctrs_0.4.2            cellranger_1.1.0       gtable_0.3.1           assertthat_0.2.1       cachem_1.0.6          
[81] xtable_1.8-4           broom_1.0.1            survival_3.4-0         googledrive_2.0.0      gargle_1.2.1          
[86] memoise_2.0.1          statmod_1.4.37         ellipsis_0.3.2
any help?
many many thanks TA
This error from
DESeq2::estimateDispersionsis saying that there aren't any replicate samples in your design and so an average dispersion value for a given condition can't be calculated. If you Google the error message you will get some more information, some good examples include https://support.bioconductor.org/p/134505/ and https://support.bioconductor.org/p/89746/Based on
sampleAnnoit does look like you have replicates so my guess is thatdesign(dxdsva) <- ~ SV1 + SV2 + sample + exon + condition:exonis not producing an appropriate model or perhaps the model design is confounded?The other thing to note is that you appear to have performed
estimateSizeFactorsbefore runningsva. I believe the recommendation is to use raw counts data forsvathen run throughestimateSizeFactors,estimateDispersions, etc. with the new model design https://bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#using-sva-with-deseq2 (though I'm not sure if the recommendation is different for DEXSeq)