differential expression analysis
1
0
Entering edit mode
6.1 years ago
bioinfo456 ▴ 150

I have the data contained in an excel format (ie; gene ids, samples, corresponding gene counts). Can somebody please explain to me how i can feed this to deseq2? I have imported this excel file into Rstudio. What next? Any sort of help would be much appreciated. Thanks.

RNA-Seq deseq2 differential expression analysis • 1.5k views
ADD COMMENT
1
Entering edit mode

How exactly is your data structured? Do you have one exel file per treatment?

ADD REPLY
1
Entering edit mode
6.1 years ago
caggtaagtat ★ 1.9k

Hi,

I would upload the data in R and save them with e.g. write.table(myfile,file="myfile.tab", sep = " ", row.names = F, col.names = F) in a certain directory with the ending ".tab".

Now you could do the following:

#Define directory
setwd("Path_to_your/Files")
directory <- getwd()

#First you need a sample Table for DESeq2 which holds information about where your data is and what condition it stands for

#Grab all files with ending .tab (i don't know if it works for .xlsx files)
sampleFiles <- grep("tab",list.files(directory),value=TRUE)

#Name the samples on sampleTable with the characters befor the ".tab"
sampleCondition <- sub(".tab","\\1",sampleFiles)
sampleCondition <- substr(sampleCondition,1,3)

#Create the sample table
sampleTable <- data.frame(sampleName = sampleFiles,
                          fileName = sampleFiles,
                          condition = sampleCondition)

#Now you have to install DESeq2 if you haven't done that already
source("https://bioconductor.org/biocLite.R")
biocLite("DESeq2")

#Get it in your library
library("DESeq2")

#Create DESeqDataSet from your data
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                       directory = directory,
                                       design= ~ condition)

#Now remove genes with less than 1 read, to get rid of noise
keep <- rowSums(counts(ddsHTSeq)) >= 1
dds <- ddsHTSeq[keep,]

#To establish the reference for the comparisons between the samples, depending on your data, define one sample as your reference
dds$condition <- relevel(dds$condition, "untreated")

#The following is the core function of the DESeq2 package and 
dds <- DESeq(dds)

#Now you can access the results with the function for example with
res <- results(dds, name="condition_treated_vs_untreated")

#Some other analysis function could be following

#Doing normalizations
vsd <- vst(dds, blind=FALSE)
rld <- rlog(dds, blind=FALSE)
head(assay(vsd), 3)
ntd <- normTransform(dds)

#Creating heatmap
library("pheatmap")
select <- order(rowMeans(counts(dds,normalized=TRUE)),
                decreasing=TRUE)[1:20]
df <- as.data.frame(colData(dds) )
pheatmap(assay(ntd)[select,], cluster_rows=FALSE, show_rownames=FALSE,
         cluster_cols=FALSE, annotation_col=df)



sampleDists <- dist(t(assay(vsd)))

library("RColorBrewer")
sampleDistMatrix <- as.matrix(sampleDists)
rownames(sampleDistMatrix) <- paste(vsd$condition, vsd$type, sep="-")
colnames(sampleDistMatrix) <- NULL
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix,
         clustering_distance_rows=sampleDists,
         clustering_distance_cols=sampleDists,
         col=colors)


#Doing PCA
plotPCA(vsd, intgroup=c("condition"))

This are just some basic applications with the DESeq package and there are many interesting things, to also do which can be found various help sites and blogs, for example the vignette or workflow on bioconductor.

ADD COMMENT

Login before adding your answer.

Traffic: 2190 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6