R analysis for read counts using edgeR
0
1
Entering edit mode
2.0 years ago
Liftedkris ▴ 20

Hi fellows,

I performed RNAseq analysis of human neutrophils infected by Aspergillus fumigatus. Everything seemed to work fine but during the differential expression analysis using EdgeR, I used the standard p.value cut off of 0.05 but surprisingly no genes were found for the conditions at that p. Value .

My question is, what could be the problem?

I have 24 samples, in three groups, each group consisting of 8 samples 4 each for control and test i did alignment with Hisat2 and sorted with samtools and used HTseq to generate read counts. at this point everything seemed to work fine.

getwd()

group <- factor(c(rep("host.30hpi_control", 4), rep("host.30hpi_infected", 4),
rep("host.90hpi_control", 4), rep("host.90hpi_infected", 4), rep("host.180hpi_control", 4), rep("host.180hpi_infected", 4) ))

library(edgeR)

counts.host <- readDGE(list.files(pattern = ".count"), data, columns = c(1,2))
counts.host$counts <- counts.host$counts[1:(nrow(counts.host$counts)-5), ] counts.host$counts <- counts.host$counts[rowSums(counts.host$counts > 3) > 2, ]
head(counts.host$counts, 20) dim(counts.host$counts)

counts.host <- calcNormFactors(counts.host)
design <- model.matrix(~0 + group)
rownames(design) <- colnames(counts.host$counts) contrasts <- makeContrasts("Host_30hpi" = grouphost.30hpi_infected - grouphost.30hpi_control, "Host_90hpi" = grouphost.90hpi_infected - grouphost.90hpi_control, "Host_180hpi" = grouphost.180hpi_infected - grouphost.180hpi_control, levels = design) library(limma) png("host_voom.png") y <- voomWithQualityWeights(counts = counts.host, design = design, plot = TRUE) dev.off() plot.colors <- c(rep("blue", 4), rep("red", 4), rep("orange", 4), rep("black", 4)) png("host_MDS.png") plotMDS(counts.host, main = "MDS Plot for Count Data", labels = colnames(counts.host$counts), col = plot.colors, cex = 0.9, xlim=c(-2,5))
dev.off()

counts.host.mod <- t(cpm(counts.host))
dist <- dist(counts.host.mod)
png("host_HC.png")
plot(hclust(dist), main="Hierarchical Clustering Dendrogram")
dev.off()

fit <- lmFit(y, design)
fit <- contrasts.fit(fit, contrasts)
fit <- eBayes(fit)
top_30hpi <- topTable(fit, coef = "Host_30hpi", adjust = "fdr", number = "Inf", p.value = 0.05, sort.by = "P")
top_90hpi <- topTable(fit, coef = "Host_90hpi", adjust = "fdr", number = "Inf", p.value = 0.05, sort.by = "P")
top_180hpi <- topTable(fit, coef = "Host_180hpi", adjust = "fdr", number = "Inf", p.value = 0.05, sort.by = "P")
write.table(top_30hpi, file = "Host_DEG_annotated30.csv", sep = ",", col.names = NA)
write.table(top_90hpi, file = "Host_DEG_annotated90.csv", sep = ",", col.names = NA)
write.table(top_180hpi, file = "Host_DEG_annotated180.csv", sep = ",", col.names = NA)
RNA-Seq edgeR limma • 1.3k views
0
Entering edit mode

Do you realize that you are not using edgeR at all for differential analysis but limma? I am not adept with limma so I cannot contribute too much but why don't you simply follow exactly the edgeR manual and then see what comes out?

0
Entering edit mode

• lmFit is a function from limma, which is presumably why you chose to use the voom-normalized values for the DE analysis
• what is this line supposed to do: counts.host$counts <- counts.host$counts[1:(nrow(counts.host\$counts)-5), ], i.e. why are you removing the last five rows?
• what do the MDS and Hclust plots actually look like?
• I'm not sure what your group and design objects look like, do you mind sharing them?