I estimated DEGs from RNASeq using edgeR method. But the final DEG list has several genes with low counts and are having high Fold change
logFC logCPM F PValue FDR Ctrl1 Ctrl2 Str1 Str2 Gene1 7.673 -2.658 33.203 1.27E-05 0.000118 0 0 0.738954 0.657007 Gene5 7.605 -2.951 26.996 3.36E-05 0.000255 0 0 0.674697 0.657007 Gene35 7.499 -2.072 22.724 8.47E-05 0.0034 0 0 0.642569 0.594435 Gene67 7.462 -2.151 36.849 2.49E-06 3.32E 0 0 0.642569 0.563149
Initialy, I filtered out low count genes using
keep<-filterByExpr(y,group = group) y<-y[keep,keep.lib.size=FALSE]
But I am not sure why genes normalized count value close to zero are retained and listed as DEGs. How can I solve this? I was thinking to use min.count parameter in filterByExpr.
But, is that the right way? Suppose if I use,
keep<-filterByExpr(y,group = group, min.count=500), the genes with greater than 1 normalized counts are retained.
To be more specific, where should I apply
filterByExpr function, before normaliztion (ie on raw counts) or after normaliztion ?
group<-factor(paste(targets$Genotype,targets$Time,targets$Treatment,sep=".")) cbind(targets,Group=group) y<-DGEList(counts = raw_counts, group = group) #Filterout low count genes keep<-filterByExpr(y,group = group) y<-y[keep,keep.lib.size=FALSE] #Data Normalization y<-calcNormFactors(y) #Design matrix design<-model.matrix(~0+group) colnames(design)<-levels(group) #Dsipersion estimation y<-estimateDisp(y,design) fit<-glmQLFit(y,design)