As my lab does not a lot of computational power, I had used Galaxy for alignment and HTSeq. To produce better graphs for downstream analysis, I had to switch to RStudio.
I am using the six samples from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132732 There are two treatment conditions IL4 and M0.
I also got the DESeq2 results using galaxy, but I thought that doing the DESeq2 from the HTSeq would have consistent formatting required later on.
The HTSeq in their tabular datatype has a header, but when I convert it to .csv file, it no longer has a header. So when I downloaded the csv file, I added the heading manually.
Below is the sorry excuse of a code I attempted for DESeq2. I think, rather I know, that the sampleCondition is where I went very wrong, but I don't know how to correct it.
#make directory with htseq-counts directory <- "/Users/mahejabeennidhi/Documents/DrKwan/rnaseq6samples/htseqcounts" directory <- system.file("extdata", package = "pasilla", mustWork = TRUE) sampleFiles <- grep("count",list.files(directory), value = TRUE) sampleCondition <- c("IL4","M0") sampleTable <- data.frame(sampleName = sampleFiles, fileName = sampleFiles, condition = sampleCondition) ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory,design=~condition) ddsHTSeq
The following is the error
Error in `.rowNamesDF<-`(x, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique value when setting 'row.names': ‘pasilla_gene_counts.tsv’ > ddsHTSeq Error: object 'ddsHTSeq' not found
Very very grateful for your insight.