Question: DESeq DataSet from HTSeqCount Error
0
gravatar for mahejabeen.nidhi
4 months ago by
mahejabeen.nidhi10 wrote:

As my lab does not a lot of computational power, I had used Galaxy for alignment and HTSeq. To produce better graphs for downstream analysis, I had to switch to RStudio.

I am using the six samples from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132732 There are two treatment conditions IL4 and M0.

I also got the DESeq2 results using galaxy, but I thought that doing the DESeq2 from the HTSeq would have consistent formatting required later on.

The HTSeq in their tabular datatype has a header, but when I convert it to .csv file, it no longer has a header. So when I downloaded the csv file, I added the heading manually.

Below is the sorry excuse of a code I attempted for DESeq2. I think, rather I know, that the sampleCondition is where I went very wrong, but I don't know how to correct it.

#make directory with htseq-counts
directory <- "/Users/mahejabeennidhi/Documents/DrKwan/rnaseq6samples/htseqcounts"
directory <- system.file("extdata", package = "pasilla", mustWork = TRUE)
sampleFiles <- grep("count",list.files(directory), value = TRUE)
sampleCondition <- c("IL4","M0")
sampleTable <- data.frame(sampleName = sampleFiles, fileName = sampleFiles, condition = sampleCondition)
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory,design=~condition)
ddsHTSeq

The following is the error

Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘pasilla_gene_counts.tsv’ 
> ddsHTSeq
Error: object 'ddsHTSeq' not found

Very very grateful for your insight.

rna-seq galaxy htseq • 179 views
ADD COMMENTlink written 4 months ago by mahejabeen.nidhi10

I assume, there is something wrong with sampleTable, e.g. the column condition has only 2 entries, but shouldn't it have 6? What is the output of just sampleTable?

Edit: it looks like you forgot to remove the line

directory <- system.file("extdata", package = "pasilla", mustWork = TRUE)

Because, first you assign directory to "/Users/mahejabeennidhi/Documents/DrKwan/rnaseq6samples/htseqcounts" (the htseq counts are there, right?), then you overwrite it.

ADD REPLYlink modified 4 months ago • written 4 months ago by e.rempel900

I made the edits, and still am getting an error

> #make directory with htseq-counts
> directory <- "/Users/mahejabeennidhi/Documents/DrKwan/rnaseq6samples/htseqcounts"
> sampleFiles <- grep("count",list.files(directory), value = TRUE)
> sampleCondition <- c("IL4","M0","IL4","M0","IL4","M0")
> sampleTable <- data.frame(sampleName = sampleFiles, fileName = sampleFiles, condition = sampleCondition)
Error in data.frame(sampleName = sampleFiles, fileName = sampleFiles,  : 
  arguments imply differing number of rows: 0, 6
> ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory,design=~condition)
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file '/Users/mahejabeennidhi/Documents/DrKwan/rnaseq6samples/htseqcounts/pasilla_gene_counts.tsv': No such file or directory
> ddsHTSeq
Error: object 'ddsHTSeq' not found
ADD REPLYlink written 4 months ago by mahejabeen.nidhi10

My guess is that in the table you are loading the genes as an extra column, thats why the number of columns are not matching

ADD REPLYlink written 4 months ago by biofalconch470

arguments imply differing number of rows: 0, 6

meaning that the object sampleFiles is likely to be empty (it has 0 rows). What is the output of list.files(directory)?

ADD REPLYlink written 4 months ago by e.rempel900
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 847 users visited in the last hour