Question

Creating Double The Columns Than There Are Samples: DEXseq

0

Entering edit mode

2.2 years ago

Y ▴ 10

I am trying to use DEXseq and I was told that I could output normalized counts using the following method by my supervisor:

library("DEXSeq")
# Create the DEXSeqDataSet object
dxd <- DEXSeqDataSetFromHTSeq(
  countsFiles,
  sampleData=sampleTable,
  design= ~ sample + exon + condition:exon,
  flattenedfile=flattenedFile )

#Normalize

normFactors <- matrix(runif(nrow(dxd)*ncol(dxd),0.5,1.5),
                      ncol=ncol(dxd),nrow=nrow(dxd),
                      dimnames=list(1:nrow(dxd),1:ncol(dxd)))

normFactors <- normFactors / exp(rowMeans(log(normFactors)))
normalizationFactors(dxd) <- normFactors

dxd = estimateSizeFactors( dxd )

###pairs
normalizedCounts <- t( t(counts(dxd)) / sizeFactors(dxd) )

# Write a table
write.table(normalizedCounts, "normalizedDexSeq.txt", sep="\t", row.names=T)

My session info is below if required:

R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.5

Matrix products: default
BLAS:   /path/to/libRblas.0.dylib 
LAPACK: /path/to/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] C/UTF-8/C/C/C/C

time zone: -
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] GenomicAlignments_1.36.0    Rsamtools_2.16.0           
 [3] Biostrings_2.68.1           XVector_0.40.0             
 [5] DEXSeq_1.46.0               RColorBrewer_1.1-3         
 [7] DESeq2_1.40.2               SummarizedExperiment_1.30.2
 [9] MatrixGenerics_1.12.3       matrixStats_1.0.0          
[11] BiocParallel_1.34.2         GenomicFeatures_1.52.1     
[13] AnnotationDbi_1.62.2        Biobase_2.60.0             
[15] GenomicRanges_1.52.0        GenomeInfoDb_1.36.1        
[17] IRanges_2.34.1              S4Vectors_0.38.1           
[19] BiocGenerics_0.46.0        

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0        IRdisplay_1.1           dplyr_1.1.2            
 [4] blob_1.2.4              filelock_1.0.2          bitops_1.0-7           
 [7] fastmap_1.1.1           RCurl_1.98-1.12         BiocFileCache_2.8.0    
[10] XML_3.99-0.14           digest_0.6.33           lifecycle_1.0.3        
[13] statmod_1.5.0           survival_3.5-7          KEGGREST_1.40.0        
[16] RSQLite_2.3.1           genefilter_1.82.1       magrittr_2.0.3         
[19] compiler_4.3.1          rlang_1.1.1             progress_1.2.2         
[22] tools_4.3.1             utf8_1.2.3              yaml_2.3.7             
[25] rtracklayer_1.60.0      prettyunits_1.1.1       S4Arrays_1.0.5         
[28] bit_4.0.5               curl_5.0.2              DelayedArray_0.26.7    
[31] xml2_1.3.5              repr_1.1.6              abind_1.4-5            
[34] pbdZMQ_0.3-9            hwriter_1.3.2.1         grid_4.3.1             
[37] fansi_1.0.4             xtable_1.8-4            colorspace_2.1-0       
[40] ggplot2_3.4.3           scales_1.2.1            biomaRt_2.56.1         
[43] cli_3.6.1               crayon_1.5.2            generics_0.1.3         
[46] httr_1.4.7              rjson_0.2.21            DBI_1.1.3              
[49] cachem_1.0.8            stringr_1.5.0           splines_4.3.1          
[52] zlibbioc_1.46.0         parallel_4.3.1          restfulr_0.0.15        
[55] base64enc_0.1-3         vctrs_0.6.3             Matrix_1.6-1           
[58] jsonlite_1.8.7          geneplotter_1.78.0      hms_1.1.3              
[61] bit64_4.0.5             locfit_1.5-9.8          annotate_1.78.0        
[64] glue_1.6.2              codetools_0.2-19        gtable_0.3.3           
[67] stringi_1.7.12          BiocIO_1.10.0           munsell_0.5.0          
[70] tibble_3.2.1            pillar_1.9.0            rappdirs_0.3.3         
[73] htmltools_0.5.6         IRkernel_1.3.2          GenomeInfoDbData_1.2.10
[76] R6_2.5.1                dbplyr_2.3.3            evaluate_0.21          
[79] lattice_0.21-8          png_0.1-8               memoise_2.0.1          
[82] Rcpp_1.0.11             uuid_1.1-0              pkgconfig_2.0.3

However, when I get the table and open it in Excel I find the number of columns is double the number of samples being processed. I am processing 6 samples. 3 experimental samples and 3 control samples. But I have 12 rows not including the Ensembl ID column. What do each of these 12 rows stand for as I only have 6 samples?

R Jupyter DEXseq • 560 views

ADD COMMENT • link 2.2 years ago by Y ▴ 10