Question: Differential gene expression by DESeq2
gravatar for umeshtanwar2
21 months ago by
umeshtanwar220 wrote:

Hi, I am working on RNA-Seq data for Arabidopsis plant. I have samples from 4 genotypes (WT, KO1, KO2, OE) with 3 replicates for each. I used STAR for alignment with reference genome and annotation file in gff3 format. Then I obtained count.txt files for each sample by using featureCounts. Now I am doing the DGE analysis by DESeq2 as:

countdata <- read.table("counts.txt", header=TRUE, row.names=1)
countdata <- countdata[ ,6:ncol(countdata)]
colnames(countdata) <- gsub("\\.[sb]am$", "", colnames(countdata))
countdata <- as.matrix(countdata)
(condition <- factor(c(rep("WT", 3), rep("KO1", 3), rep("KO2", 3), rep("OE", 3))))
(coldata <- data.frame(row.names=colnames(countdata), condition))
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition)
dds <- DESeq(dds)

I have following questions: 1. I used annotation file gff3 for alignment and for featureCounts, I converted gff3 to gtf format. Is it the right way or it would make any impact for DGE analysis? 2. I used SAM files in featureCounts. Is it okay or I should use BAM files? 3. In DESeq2 I am getting the results for each replicate individually. How can I merge the replicates for one genotype to compare with the other genotypes. I am not sure if the code I am using is right?

Any guidance from you will be very helpful.

Thank you

rna-seq • 818 views
ADD COMMENTlink modified 21 months ago by i.sudbery9.7k • written 21 months ago by umeshtanwar220
gravatar for i.sudbery
21 months ago by
Sheffield, UK
i.sudbery9.7k wrote:
  1. This should be fine.
  2. As should this.
  3. You need a final step in your analysis, which it to get the statistical results from your analysis. This is achieved using the "results" function. As in:

    deseq_results <- results(dds, alpha=0.05, threshold=....., .....)

    where threshold is the minimum fold change you are interested in . You will also need to specify the contrast you are intersted in. This will depend on whether you are interested in asking if either: a) genotype has an effect on expression overall, or b) if each genotype is different from a control. See the DESeq2 manual for details.

ADD COMMENTlink written 21 months ago by i.sudbery9.7k

Thank you @i.sudbery. It worked. Its showing:

out of 25686 with nonzero total read count adjusted p-value < 0.05 LFC > 1.00 (up) : 627, 2.4% LFC < -1.00 (down) : 2473, 9.6% outliers [1] : 0, 0% low counts [2] : 2944, 11% (mean count < 1) [1] see 'cooksCutoff' argument of ?results [2] see 'independentFiltering' argument of ?results

Can you please tell me how can I extract the upregulated and downregulated genes?

ADD REPLYlink written 21 months ago by umeshtanwar220

the deseq_results object should have two columns: one for adjusted pvalues (I can't remember if its called padj or adjustedPVal off the top of my head) and one for log2FoldChange. The up genes will be those with adjustedPVal < 0.05 and log2FoldChange > 1, and the down genes will be those with adjustedPVal < 0,05 and log2FoldChange < -1.

ADD REPLYlink written 21 months ago by i.sudbery9.7k

Thank you @i.sudberry. It was really helpful for me.

ADD REPLYlink written 21 months ago by umeshtanwar220
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1005 users visited in the last hour