DESeq DataSet from HTSeqCount Error
0
0
Entering edit mode
3.8 years ago

As my lab does not a lot of computational power, I had used Galaxy for alignment and HTSeq. To produce better graphs for downstream analysis, I had to switch to RStudio.

I am using the six samples from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132732 There are two treatment conditions IL4 and M0.

I also got the DESeq2 results using galaxy, but I thought that doing the DESeq2 from the HTSeq would have consistent formatting required later on.

The HTSeq in their tabular datatype has a header, but when I convert it to .csv file, it no longer has a header. So when I downloaded the csv file, I added the heading manually.

Below is the sorry excuse of a code I attempted for DESeq2. I think, rather I know, that the sampleCondition is where I went very wrong, but I don't know how to correct it.

#make directory with htseq-counts
directory <- "/Users/mahejabeennidhi/Documents/DrKwan/rnaseq6samples/htseqcounts"
directory <- system.file("extdata", package = "pasilla", mustWork = TRUE)
sampleFiles <- grep("count",list.files(directory), value = TRUE)
sampleCondition <- c("IL4","M0")
sampleTable <- data.frame(sampleName = sampleFiles, fileName = sampleFiles, condition = sampleCondition)
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory,design=~condition)
ddsHTSeq

The following is the error

Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘pasilla_gene_counts.tsv’ 
> ddsHTSeq
Error: object 'ddsHTSeq' not found

Very very grateful for your insight.

RNA-Seq Galaxy HTSeq • 1.6k views
ADD COMMENT
0
Entering edit mode

I assume, there is something wrong with sampleTable, e.g. the column condition has only 2 entries, but shouldn't it have 6? What is the output of just sampleTable?

Edit: it looks like you forgot to remove the line

directory <- system.file("extdata", package = "pasilla", mustWork = TRUE)

Because, first you assign directory to "/Users/mahejabeennidhi/Documents/DrKwan/rnaseq6samples/htseqcounts" (the htseq counts are there, right?), then you overwrite it.

ADD REPLY
0
Entering edit mode

I made the edits, and still am getting an error

> #make directory with htseq-counts
> directory <- "/Users/mahejabeennidhi/Documents/DrKwan/rnaseq6samples/htseqcounts"
> sampleFiles <- grep("count",list.files(directory), value = TRUE)
> sampleCondition <- c("IL4","M0","IL4","M0","IL4","M0")
> sampleTable <- data.frame(sampleName = sampleFiles, fileName = sampleFiles, condition = sampleCondition)
Error in data.frame(sampleName = sampleFiles, fileName = sampleFiles,  : 
  arguments imply differing number of rows: 0, 6
> ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory,design=~condition)
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file '/Users/mahejabeennidhi/Documents/DrKwan/rnaseq6samples/htseqcounts/pasilla_gene_counts.tsv': No such file or directory
> ddsHTSeq
Error: object 'ddsHTSeq' not found
ADD REPLY
0
Entering edit mode

My guess is that in the table you are loading the genes as an extra column, thats why the number of columns are not matching

ADD REPLY
0
Entering edit mode

arguments imply differing number of rows: 0, 6

meaning that the object sampleFiles is likely to be empty (it has 0 rows). What is the output of list.files(directory)?

ADD REPLY

Login before adding your answer.

Traffic: 2592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6