Question

compare rows of colData and count Matrix

0

Entering edit mode

7.3 years ago

Assa Yeroslaviz ★ 1.8k

Hi all,

I think this is probably a very simple question, but i have encounter some problems with data analysis when I have multiple conditions in one data set. Below I present one example of a DEXSeq analysis, but the same problem occurs also when doing a DESeq analysis with more than one condition in the data set.

In my workflow I read all the files into R

    BaseDir = "/home/USER/dexseq"
    countFiles = list.files(BaseDir, pattern="MW.*.txt$", full.names=TRUE)
    countFiles
    [1] "/home/USER/dexseq/MW10.SE.txt"
     [2] "/home/USER/dexseq/MW11.SE.txt"
     [3] "/home/USER/dexseq/MW12.PE.PE.txt"
     [4] "/home/USER/dexseq/MW12.SE.txt"
     [5] "/home/USER/dexseq/MW13.SE.txt"
     [6] "/home/USER/dexseq/MW14.SE.txt"
    ...
    [21] "/home/USER/dexseq/MW7.SE.txt"
    [22] "/home/USER/dexseq/MW8.PE.PE.txt"
    [23] "/home/USER/dexseq/MW8.SE.txt"
    [24] "/home/USER/dexseq/MW9.SE.txt"

my metadata file though is sorted based on condition

sample.Names    ShortName   condition   libraryType
MW1.SE.txt  MW1 ES  singleEnd
MW8.PE.PE.txt   MW8 ES  PairedEnd
MW8.SE.txt  MW8 ES  singleEnd
MW16.SE.txt MW16    ES  singleEnd
MW19.SE.txt MW19    ES  singleEnd
MW7.SE.txt  MW7 EB9 singleEnd
MW15.PE.PE.txt  MW15    EB9 PairedEnd
MW15.SE.txt MW15    EB9 singleEnd
MW6.SE.txt  MW6 EB8 singleEnd
...
MW10.SE.txt MW10    EB4 singleEnd
MW9.SE.txt  MW9 EB3 singleEnd
MW18.SE.txt MW18    EB3 singleEnd
MW21.SE.txt MW21    EB3 singleEnd
MW17.SE.txt MW17    EB2 singleEnd
MW20.SE.txt MW20    EB2 singleEnd

When I try to compare for example my control (ES) with EB2 I do as follow

metaData <- read_tsv("metadata.txt")
metaData <- metaData[order(metaData$condition, decreasing = TRUE),]
EB2.ES <- subset(metaData, subset = metaData$condition %in% c("EB2", "ES"))

sampleTable <- data.frame(row.names= EB2.ES$sample.Names, condition= EB2.ES$condition, lib.type=EB2.ES$libraryType)
> sampleTable
              condition  lib.type
MW16.SE.txt          ES singleEnd
MW19.SE.txt          ES singleEnd
MW1.SE.txt           ES singleEnd
MW8.PE.PE.txt        ES PairedEnd
MW8.SE.txt           ES singleEnd
MW17.SE.txt         EB2 singleEnd
MW20.SE.txt         EB2 singleEnd
counts1<-  countFiles[basename(countFiles) %in% row.names(sampleTable)]
counts1
[1] "/home/USER/dexseq/MW16.SE.txt"  
[2] "/home/USER/dexseq/MW17.SE.txt"  
[3] "/home/USER/dexseq/MW19.SE.txt"  
[4] "/home/USER/dexseq/MW1.SE.txt"   
[5] "/home/USER/dexseq/MW20.SE.txt"  
[6] "/home/USER/dexseq/MW8.PE.PE.txt"
[7] "/home/USER/dexseq/MW8.SE.txt"   

dxd1 = DEXSeqDataSetFromHTSeq(
  counts1,
  sampleData=sampleTable,
  design= ~ sample + exon + condition:exon,
  flattenedfile=flattenedFile )

As you can see, the order of the samples in the sampleTable, which i would like to use as sampleData is not identical to the order of the count files in counts1. Is there an automatic way to ensure that these two object have the same files in a similar order? Can I somehow subset the complete list of count files and/or the metadata to make sure, that they are still the same files?

thanks

Assa

dexseq deseq matrix • 1.6k views

ADD COMMENT • link 7.3 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

The whole point of making a sample table is to not run into this problem (e.g., by using the DESeqDatasetFromHTseq function in DESeq2). If you're going to create the count matrix yourself then you're responsible for ensuring that it's in the appropriate order.

ADD REPLY • link 7.3 years ago by Devon Ryan 104k