Entering edit mode
8 months ago
Y
•
0
I am trying to use DEXseq and I was told that I could output normalized counts using the following method by my supervisor:
library("DEXSeq")
# Create the DEXSeqDataSet object
dxd <- DEXSeqDataSetFromHTSeq(
countsFiles,
sampleData=sampleTable,
design= ~ sample + exon + condition:exon,
flattenedfile=flattenedFile )
#Normalize
normFactors <- matrix(runif(nrow(dxd)*ncol(dxd),0.5,1.5),
ncol=ncol(dxd),nrow=nrow(dxd),
dimnames=list(1:nrow(dxd),1:ncol(dxd)))
normFactors <- normFactors / exp(rowMeans(log(normFactors)))
normalizationFactors(dxd) <- normFactors
dxd = estimateSizeFactors( dxd )
###pairs
normalizedCounts <- t( t(counts(dxd)) / sizeFactors(dxd) )
# Write a table
write.table(normalizedCounts, "normalizedDexSeq.txt", sep="\t", row.names=T)
My session info is below if required:
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.5
Matrix products: default
BLAS: /path/to/libRblas.0.dylib
LAPACK: /path/to/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] C/UTF-8/C/C/C/C
time zone: -
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] GenomicAlignments_1.36.0 Rsamtools_2.16.0
[3] Biostrings_2.68.1 XVector_0.40.0
[5] DEXSeq_1.46.0 RColorBrewer_1.1-3
[7] DESeq2_1.40.2 SummarizedExperiment_1.30.2
[9] MatrixGenerics_1.12.3 matrixStats_1.0.0
[11] BiocParallel_1.34.2 GenomicFeatures_1.52.1
[13] AnnotationDbi_1.62.2 Biobase_2.60.0
[15] GenomicRanges_1.52.0 GenomeInfoDb_1.36.1
[17] IRanges_2.34.1 S4Vectors_0.38.1
[19] BiocGenerics_0.46.0
loaded via a namespace (and not attached):
[1] tidyselect_1.2.0 IRdisplay_1.1 dplyr_1.1.2
[4] blob_1.2.4 filelock_1.0.2 bitops_1.0-7
[7] fastmap_1.1.1 RCurl_1.98-1.12 BiocFileCache_2.8.0
[10] XML_3.99-0.14 digest_0.6.33 lifecycle_1.0.3
[13] statmod_1.5.0 survival_3.5-7 KEGGREST_1.40.0
[16] RSQLite_2.3.1 genefilter_1.82.1 magrittr_2.0.3
[19] compiler_4.3.1 rlang_1.1.1 progress_1.2.2
[22] tools_4.3.1 utf8_1.2.3 yaml_2.3.7
[25] rtracklayer_1.60.0 prettyunits_1.1.1 S4Arrays_1.0.5
[28] bit_4.0.5 curl_5.0.2 DelayedArray_0.26.7
[31] xml2_1.3.5 repr_1.1.6 abind_1.4-5
[34] pbdZMQ_0.3-9 hwriter_1.3.2.1 grid_4.3.1
[37] fansi_1.0.4 xtable_1.8-4 colorspace_2.1-0
[40] ggplot2_3.4.3 scales_1.2.1 biomaRt_2.56.1
[43] cli_3.6.1 crayon_1.5.2 generics_0.1.3
[46] httr_1.4.7 rjson_0.2.21 DBI_1.1.3
[49] cachem_1.0.8 stringr_1.5.0 splines_4.3.1
[52] zlibbioc_1.46.0 parallel_4.3.1 restfulr_0.0.15
[55] base64enc_0.1-3 vctrs_0.6.3 Matrix_1.6-1
[58] jsonlite_1.8.7 geneplotter_1.78.0 hms_1.1.3
[61] bit64_4.0.5 locfit_1.5-9.8 annotate_1.78.0
[64] glue_1.6.2 codetools_0.2-19 gtable_0.3.3
[67] stringi_1.7.12 BiocIO_1.10.0 munsell_0.5.0
[70] tibble_3.2.1 pillar_1.9.0 rappdirs_0.3.3
[73] htmltools_0.5.6 IRkernel_1.3.2 GenomeInfoDbData_1.2.10
[76] R6_2.5.1 dbplyr_2.3.3 evaluate_0.21
[79] lattice_0.21-8 png_0.1-8 memoise_2.0.1
[82] Rcpp_1.0.11 uuid_1.1-0 pkgconfig_2.0.3
However, when I get the table and open it in Excel I find the number of columns is double the number of samples being processed. I am processing 6 samples. 3 experimental samples and 3 control samples. But I have 12 rows not including the Ensembl ID column. What do each of these 12 rows stand for as I only have 6 samples?