Question: Requesting help with RNA-Seq differential gene analysis
0
gravatar for sagnik
19 months ago by
sagnik0
sagnik0 wrote:

Hello,

I am performing RNA-Seq data analysis to find differentially expressed genes in my data. The data is from inoculated leaves of Barley. The data has 5 mutants and 6 time-points. Each of the mutants and each time-point has 3 technical replicates brining up the total number of samples to 90. There are roughly 40,000 genes in Barley that I wish to probe for differential expression. I have obtained the gene read counts using RSEM. I have some questions about the analysis.

1) My goal is to carry out differential gene analysis, gene expression clustering and construct gene regulatory network from the expression data. For the DGE analysis, I want to find genes which are differentially expressed within each mutant across different time-points. Also I wish to find genes which are differentially expressed among different mutants at a given time-point. I have used the function rsem-generate-data-matrix to combine the counts from the 90 samples. I have read this count data in R. But I cannot understand how I will obtain the desired differential expression p-values and log2FC. Here is the code I have written.

samples <- read.csv("counts.csv", header=TRUE,row.names=1)

# Selecting genes by counts
counts = subset(samples, rowSums(samples > 1) >= 3)
time=c(0, 16, 20, 24, 32, 48)
gen=c("bln1", "dm", "mla6", "rar3", "wt")
cols=data.frame(genotype=rep(gen, each=3*6), timepoint=rep(rep(time, each=3),5), replication=rep(c(1,2,3),30))
cols$timepoint = as.factor(cols$timepoint)
cols$replication = as.factor(cols$replication)
rownames(cols) = colnames(counts)

dds=DESeqDataSetFromMatrix(countData = counts, colData = cols, design = ~ genotype * timepoint * replication)
dds=DESeq(dds)
results(dds)

# Extracting and writing normalized counts
temp = counts(dds, normalized=TRUE)
norm_counts = data.frame(gene = rownames(temp), temp, row.names = NULL)

I think I have to alter the design for each kind of analysis, but I am unsure how to do so. Does the DESeq() function automatically perform normalization? In the code above, I have created the variable "dds" using the "counts" which is not normalized. Do I have to perform normalization and then create the "dds" variable? Here is the link for the counts file.

2) For gene expression clustering and gene regulatory network should we perform rst or vst? Also should the transformation be performed on the raw count or on the normalized counts?

3) I am also not quite sure what conclusion can be drawn from the MA plot.

4) What is the need for shrinkage?

Thank you.

ADD COMMENTlink modified 19 months ago by Kevin Blighe48k • written 19 months ago by sagnik0

Answering some of your questions (you had too many, not a good practice):

Yes, DESeq() automatically perform normalization, you create the DESeqDataSet with counts, as shown on DESeq2 vignette.

For clustering, I use rlog(). rlog() transforms count data, but as it is a DESeq2 function, it also accepts a DESeqDataSet object. I don't know what kind of "gene regulatory network" analysis you have in mind, so I don't know which transformation (if any) is best.

You said you used RSEM to obtain the counts. RSEM primary goal is to obtain transcript level cunts, but DESeq2 is designed to perform differential gene (not transcript) expression. Did you summarize your RSEM results into gene counts?

P.S.: look at the time-course design from this BioConductor RNAseq workflow.

ADD REPLYlink modified 19 months ago • written 19 months ago by h.mon27k

Thank you for your reply.

RSEM generates both counts at the gene level and also at the transcript level. For this experiment I am planning to use only the gene counts and not the transcripts counts.

ADD REPLYlink written 19 months ago by sagnik0
0
gravatar for theobroma22
19 months ago by
theobroma221.1k
theobroma221.1k wrote:

Check out the Limma package available on Bioconductor.org. You can normalize your counts using the voom or Limma-trend method. Then, you can make a design and contrast matrices to do your cross comparisons to get your DEGs once you fit the model and perform eBayes. There are tons of Biostars posts already out there on this. If you’re still having trouble make a new post, and paste your code. Hope this helps.

ADD COMMENTlink written 19 months ago by theobroma221.1k

Thank you for you reply. I will surely try out voom and limma.

ADD REPLYlink written 19 months ago by sagnik0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 971 users visited in the last hour