Hi, I am working on RNA-Seq data for Arabidopsis plant. I have samples from 4 genotypes (WT, KO1, KO2, OE) with 3 replicates for each. I used STAR for alignment with reference genome and annotation file in gff3 format. Then I obtained count.txt files for each sample by using featureCounts. Now I am doing the DGE analysis by DESeq2 as:
countdata <- read.table("counts.txt", header=TRUE, row.names=1)
countdata <- countdata[ ,6:ncol(countdata)]
colnames(countdata) <- gsub("\\.[sb]am$", "", colnames(countdata))
countdata <- as.matrix(countdata)
head(countdata)
(condition <- factor(c(rep("WT", 3), rep("KO1", 3), rep("KO2", 3), rep("OE", 3))))
library(DESeq2)
(coldata <- data.frame(row.names=colnames(countdata), condition))
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition)
dds
dds <- DESeq(dds)
I have following questions: 1. I used annotation file gff3 for alignment and for featureCounts, I converted gff3 to gtf format. Is it the right way or it would make any impact for DGE analysis? 2. I used SAM files in featureCounts. Is it okay or I should use BAM files? 3. In DESeq2 I am getting the results for each replicate individually. How can I merge the replicates for one genotype to compare with the other genotypes. I am not sure if the code I am using is right?
Any guidance from you will be very helpful.
Thank you