I'm new to R and DESeq2 and I'm trying to run differential expression as below
library(DESeq2)
count_file_names <- grep("counts",list.files("HTSeq_counts"),value=T)
host_type < c("Damaged","Control")
sample_information <-data.frame(sampleName = count_file_names, fileName = count_file_names, condition = host_type)
DESeq_data <- DESeqDataSetFromHTSeqCount(sampleTable = sample_information, directory = "HTSeq_counts", design = ~condition)
colData(DESeq_data)$condition <- factor(colData(DESeq_data)$condition,levels = c('Damaged','Control'))
rld <- rlogTransformation(DESeq_data, blind=T)
When I look at the rld matrix, it has got row names that aren't annotated and hence, not directly useful; note- I intend to generate plots based on this matrix in downstream steps and so, the corresponding actual gene names are required
print(rld)
class: DESeqTransform
dim: 33219 38
metadata(1): version
assays(1): ''
rownames(33219): g100.t1 g1000.t1 ... g9998.t1 g9999.t1
rowData names(7): baseMean baseVar ... dispFit rlogIntercept
colnames(38): Damaged_R1.counts Damaged_R2.counts ...
Control_R4.counts Control_R5.counts
colData names(2): condition sizeFactor
I have a separate "Annotations.csv" which has the gene name information
gene_names <- read.csv("Annotations.csv", header=TRUE, sep=",", stringsAsFactors=FALSE)
head(gene_names)
gene_id name product
1 g38227.t1 Stk10 Serine/threonine-protein kinase 10
2 g38227.t2 Stk10 Serine/threonine-protein kinase 10
3 g1000.t1 Ccnh Cyclin-H
4 g46237.t1 fam136a Protein FAM136A
5 g100.t1 H2B Histone H2B
6 g40390.t1 STAC2 SH3 and cysteine-rich domain-containing protein 2
and I would like to replace rownames in rld matrix according with the correct corresponding gene name as specified in gene_names. For example, the rld matrix rownames(33219) should appear as names "H2B CCnh ... and so on" instead of "g100.t1 g1000.t1 ... and so on". I tried the below R code
new.rld <- cbind(name=gene_names$name[ match(rownames(rld), gene_names$gene_id) ], rld)
but it gave me the this error:
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘bindCOLS’ for signature ‘"character"’
Any help in generating correctly the new.rld matrix would be very much appreciated!
Thanks a zillion Michael for pointing to me about the multiple transcript issue. I went for the transcript option instead of the standard summarization at the gene level due to the inability to relate the gene ids with the gene names on the original gtf file. Below are the first few lines of the gtf file that I'm using. I was wondering if by any chance you would have any ideas on addressing this? Apologies for asking too much here; I seem to have hit a brick wall here :-( Thanks again for sharing your super-important insight earlier
This is resolved (no more hitting on the wall!). Went for the featureCounts pipeline (since it seems to be better anyways for my paired-end read data)
Great to hear, I am using featureCounts myself for ease of use and speed. Choice of either program should not affect the outcome of DE analysis and this should now be straight-forward.