Question: R analysis for read counts using edgeR
0
gravatar for Liftedkris
4 weeks ago by
Liftedkris10
Germany
Liftedkris10 wrote:

Hi fellows,

I performed RNAseq analysis of human neutrophils infected by Aspergillus fumigatus. Everything seemed to work fine but during the differential expression analysis using EdgeR, I used the standard p.value cut off of 0.05 but surprisingly no genes were found for the conditions at that p. Value .

My question is, what could be the problem?

I have 24 samples, in three groups, each group consisting of 8 samples 4 each for control and test i did alignment with Hisat2 and sorted with samtools and used HTseq to generate read counts. at this point everything seemed to work fine.

getwd()

data <- setwd("C:/Users/HP Gamer/Desktop/RNA_seq_read_counts")
group <- factor(c(rep("host.30hpi_control", 4), rep("host.30hpi_infected", 4), 
rep("host.90hpi_control", 4), rep("host.90hpi_infected", 4), rep("host.180hpi_control", 4), rep("host.180hpi_infected", 4) )) 

library(edgeR) 

counts.host <- readDGE(list.files(pattern = ".count"), data, columns = c(1,2)) 
counts.host$counts <- counts.host$counts[1:(nrow(counts.host$counts)-5), ] 
counts.host$counts <- counts.host$counts[rowSums(counts.host$counts > 3) > 2, ]     
head(counts.host$counts, 20) 
dim(counts.host$counts)

counts.host <- calcNormFactors(counts.host) 
design <- model.matrix(~0 + group)     
rownames(design) <- colnames(counts.host$counts)
contrasts <- makeContrasts("Host_30hpi" = grouphost.30hpi_infected - grouphost.30hpi_control, "Host_90hpi" = grouphost.90hpi_infected - grouphost.90hpi_control, "Host_180hpi" = grouphost.180hpi_infected - grouphost.180hpi_control, levels = design)

library(limma)

png("host_voom.png")
y <- voomWithQualityWeights(counts = counts.host, design = design, plot = TRUE)     
dev.off() 

plot.colors <- c(rep("blue", 4), rep("red", 4), rep("orange", 4), rep("black", 4)) 
png("host_MDS.png")
plotMDS(counts.host, main = "MDS Plot for Count Data", labels = colnames(counts.host$counts), col = plot.colors, cex = 0.9, xlim=c(-2,5))
dev.off()

counts.host.mod <- t(cpm(counts.host)) 
dist <- dist(counts.host.mod) 
png("host_HC.png") 
plot(hclust(dist), main="Hierarchical Clustering Dendrogram") 
dev.off() 

fit <- lmFit(y, design) 
fit <- contrasts.fit(fit, contrasts) 
fit <- eBayes(fit) 
top_30hpi <- topTable(fit, coef = "Host_30hpi", adjust = "fdr", number = "Inf", p.value = 0.05, sort.by = "P") 
top_90hpi <- topTable(fit, coef = "Host_90hpi", adjust = "fdr", number = "Inf", p.value = 0.05, sort.by = "P") 
top_180hpi <- topTable(fit, coef = "Host_180hpi", adjust = "fdr", number = "Inf", p.value = 0.05, sort.by = "P") 
write.table(top_30hpi, file = "Host_DEG_annotated30.csv", sep = ",", col.names = NA)
write.table(top_90hpi, file = "Host_DEG_annotated90.csv", sep = ",", col.names = NA)
write.table(top_180hpi, file = "Host_DEG_annotated180.csv", sep = ",", col.names = NA)
limma edger rna-seq • 159 views
ADD COMMENTlink modified 4 weeks ago by h.mon25k • written 4 weeks ago by Liftedkris10

Do you realize that you are not using edgeR at all for differential analysis but limma? I am not adept with limma so I cannot contribute too much but why don't you simply follow exactly the edgeR manual and then see what comes out?

ADD REPLYlink written 4 weeks ago by ATpoint16k

Just a couple of comments:

  • lmFit is a function from limma, which is presumably why you chose to use the voom-normalized values for the DE analysis
  • what is this line supposed to do: counts.host$counts <- counts.host$counts[1:(nrow(counts.host$counts)-5), ], i.e. why are you removing the last five rows?
  • what do the MDS and Hclust plots actually look like?
  • I'm not sure what your group and design objects look like, do you mind sharing them?
ADD REPLYlink written 4 weeks ago by Friederike4.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1102 users visited in the last hour