Question

Requesting help with RNA-Seq differential gene analysis

0

Entering edit mode

6.7 years ago

sagnik ▴ 50

Hello,

I am performing RNA-Seq data analysis to find differentially expressed genes in my data. The data is from inoculated leaves of Barley. The data has 5 mutants and 6 time-points. Each of the mutants and each time-point has 3 technical replicates brining up the total number of samples to 90. There are roughly 40,000 genes in Barley that I wish to probe for differential expression. I have obtained the gene read counts using RSEM. I have some questions about the analysis.

1) My goal is to carry out differential gene analysis, gene expression clustering and construct gene regulatory network from the expression data. For the DGE analysis, I want to find genes which are differentially expressed within each mutant across different time-points. Also I wish to find genes which are differentially expressed among different mutants at a given time-point. I have used the function rsem-generate-data-matrix to combine the counts from the 90 samples. I have read this count data in R. But I cannot understand how I will obtain the desired differential expression p-values and log2FC. Here is the code I have written.

samples <- read.csv("counts.csv", header=TRUE,row.names=1)

# Selecting genes by counts
counts = subset(samples, rowSums(samples > 1) >= 3)
time=c(0, 16, 20, 24, 32, 48)
gen=c("bln1", "dm", "mla6", "rar3", "wt")
cols=data.frame(genotype=rep(gen, each=3*6), timepoint=rep(rep(time, each=3),5), replication=rep(c(1,2,3),30))
cols$timepoint = as.factor(cols$timepoint)
cols$replication = as.factor(cols$replication)
rownames(cols) = colnames(counts)

dds=DESeqDataSetFromMatrix(countData = counts, colData = cols, design = ~ genotype * timepoint * replication)
dds=DESeq(dds)
results(dds)

# Extracting and writing normalized counts
temp = counts(dds, normalized=TRUE)
norm_counts = data.frame(gene = rownames(temp), temp, row.names = NULL)

I think I have to alter the design for each kind of analysis, but I am unsure how to do so. Does the DESeq() function automatically perform normalization? In the code above, I have created the variable "dds" using the "counts" which is not normalized. Do I have to perform normalization and then create the "dds" variable? Here is the link for the counts file.

2) For gene expression clustering and gene regulatory network should we perform rst or vst? Also should the transformation be performed on the raw count or on the normalized counts?

3) I am also not quite sure what conclusion can be drawn from the MA plot.

4) What is the need for shrinkage?

Thank you

differential-gene-expression deseq2 • 2.2k views

ADD COMMENT • link updated 7 months ago by Ram 44k • written 6.7 years ago by sagnik ▴ 50

0

Entering edit mode

Answering some of your questions (you had too many, not a good practice):

Yes, DESeq() automatically perform normalization, you create the DESeqDataSet with counts, as shown on DESeq2 vignette.

For clustering, I use rlog(). rlog() transforms count data, but as it is a DESeq2 function, it also accepts a DESeqDataSet object. I don't know what kind of "gene regulatory network" analysis you have in mind, so I don't know which transformation (if any) is best.

You said you used RSEM to obtain the counts. RSEM primary goal is to obtain transcript level cunts, but DESeq2 is designed to perform differential gene (not transcript) expression. Did you summarize your RSEM results into gene counts?

P.S.: look at the time-course design from this BioConductor RNAseq workflow.

ADD REPLY • link 6.7 years ago by h.mon 35k

0

Entering edit mode

Thank you for your reply.

RSEM generates both counts at the gene level and also at the transcript level. For this experiment I am planning to use only the gene counts and not the transcripts counts.

ADD REPLY • link 6.7 years ago by sagnik ▴ 50

score 0 · Answer 1 · 2018-02-16

0

Entering edit mode

6.7 years ago

theobroma22 ★ 1.2k

Check out the Limma package available on Bioconductor.org. You can normalize your counts using the voom or Limma-trend method. Then, you can make a design and contrast matrices to do your cross comparisons to get your DEGs once you fit the model and perform eBayes. There are tons of Biostars posts already out there on this. If you’re still having trouble make a new post, and paste your code. Hope this helps.