I think this is probably a very simple question, but i have encounter some problems with data analysis when I have multiple conditions in one data set. Below I present one example of a DEXSeq analysis, but the same problem occurs also when doing a DESeq analysis with more than one condition in the data set.
In my workflow I read all the files into R
BaseDir = "/home/USER/dexseq" countFiles = list.files(BaseDir, pattern="MW.*.txt$", full.names=TRUE) countFiles  "/home/USER/dexseq/MW10.SE.txt"  "/home/USER/dexseq/MW11.SE.txt"  "/home/USER/dexseq/MW12.PE.PE.txt"  "/home/USER/dexseq/MW12.SE.txt"  "/home/USER/dexseq/MW13.SE.txt"  "/home/USER/dexseq/MW14.SE.txt" ...  "/home/USER/dexseq/MW7.SE.txt"  "/home/USER/dexseq/MW8.PE.PE.txt"  "/home/USER/dexseq/MW8.SE.txt"  "/home/USER/dexseq/MW9.SE.txt"
my metadata file though is sorted based on condition
sample.Names ShortName condition libraryType MW1.SE.txt MW1 ES singleEnd MW8.PE.PE.txt MW8 ES PairedEnd MW8.SE.txt MW8 ES singleEnd MW16.SE.txt MW16 ES singleEnd MW19.SE.txt MW19 ES singleEnd MW7.SE.txt MW7 EB9 singleEnd MW15.PE.PE.txt MW15 EB9 PairedEnd MW15.SE.txt MW15 EB9 singleEnd MW6.SE.txt MW6 EB8 singleEnd ... MW10.SE.txt MW10 EB4 singleEnd MW9.SE.txt MW9 EB3 singleEnd MW18.SE.txt MW18 EB3 singleEnd MW21.SE.txt MW21 EB3 singleEnd MW17.SE.txt MW17 EB2 singleEnd MW20.SE.txt MW20 EB2 singleEnd
When I try to compare for example my control (ES) with EB2 I do as follow
metaData <- read_tsv("metadata.txt") metaData <- metaData[order(metaData$condition, decreasing = TRUE),] EB2.ES <- subset(metaData, subset = metaData$condition %in% c("EB2", "ES")) sampleTable <- data.frame(row.names= EB2.ES$sample.Names, condition= EB2.ES$condition, lib.type=EB2.ES$libraryType) > sampleTable condition lib.type MW16.SE.txt ES singleEnd MW19.SE.txt ES singleEnd MW1.SE.txt ES singleEnd MW8.PE.PE.txt ES PairedEnd MW8.SE.txt ES singleEnd MW17.SE.txt EB2 singleEnd MW20.SE.txt EB2 singleEnd counts1<- countFiles[basename(countFiles) %in% row.names(sampleTable)] counts1  "/home/USER/dexseq/MW16.SE.txt"  "/home/USER/dexseq/MW17.SE.txt"  "/home/USER/dexseq/MW19.SE.txt"  "/home/USER/dexseq/MW1.SE.txt"  "/home/USER/dexseq/MW20.SE.txt"  "/home/USER/dexseq/MW8.PE.PE.txt"  "/home/USER/dexseq/MW8.SE.txt" dxd1 = DEXSeqDataSetFromHTSeq( counts1, sampleData=sampleTable, design= ~ sample + exon + condition:exon, flattenedfile=flattenedFile )
As you can see, the order of the samples in the sampleTable, which i would like to use as
sampleData is not identical to the order of the count files in
Is there an automatic way to ensure that these two object have the same files in a similar order?
Can I somehow subset the complete list of count files and/or the metadata to make sure, that they are still the same files?