Question: How can I change the fold change based on count per million value to 4?
0
gravatar for Keyouer
3 months ago by
Keyouer0
Japan
Keyouer0 wrote:

Hey guys, I am a freshman to learn R... Recently I am using the EdgeR package to perform DEG analysis, here is the R code I am using... table <- as.data.frame(topTags(result, n = nrow(count))) is.DEG <- as.logical(table$FDR < 0.05) DEG.names <- rownames(table)[is.DEG] png("deg.png",width = 640, height = 480) plotSmear(result, de.tags = DEG.names) dev.off()

updeg <- subset(table, FDR <0.05) updeg <- subset(updeg, logFC > 0) uc <- sum(updeg$FDR < 0.05) u <- paste(project,"up") countlist[u] <- uc updeg <- rownames(updeg) write.table(updeg, "updeg.csv", quote=F, append=T,row.names=FALSE)

This is I summarized according to the guidebook and online information, But the thing is I want to change the fold change to more stringently like 4... I would be very appreciative if someone could help me with that...

rna-seq • 211 views
ADD COMMENTlink modified 5 weeks ago • written 3 months ago by Keyouer0

Is this solved? If not please explain why. Also consider upvoting comments that helped. Please also highlight code with the code option 10101.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by ATpoint31k

Oh, sorry, I posted new questions...

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Keyouer0

...and you did not check any online tutorial on filtering in R the last 9 weeks?

Please do that.

Whenever you have a data.frame you can do:

df[df$logFC > 4 & df$logCPM > 2,]

Just convert your topTags object to a data frame and then start subsetting.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by ATpoint31k
0
gravatar for ATpoint
3 months ago by
ATpoint31k
Germany
ATpoint31k wrote:

I always do:

## get topTags for all genes:
TT    <- topTags(ResultFromQLFTest, n=Inf, adjust.method="BH", sort.by="none")

## tabular output
TT    <- TT$table

## add gene names:
TT.out <- data.frame(Gene = rownames(TT), TT)

## write as TSV to disk:
write.table(sep="\t", quote = FALSE, row.names = FALSE, col.names = TRUE, file = "output.tsv", x = TT.out)

I strongly suggest you always provide as supplement the full output, not just the significant ones to give the reader the ability to scan for all genes. It might also be of interest to see if there is strong evidence against differential expression, therefore provide full topTags output.

ADD COMMENTlink modified 3 months ago • written 3 months ago by ATpoint31k

Thank you so much for your kindly reply. I will try with your code. But sometimes for beginners, like me, even a small change in the code could make me feel confused... Your explanation seems easy to understand. Appreciate...

ADD REPLYlink written 3 months ago by Keyouer0

You're welcome. Feel free to ask if you have any trouble on the way.

ADD REPLYlink written 3 months ago by ATpoint31k

Yeah, thank you. The code "ResultFromQLFTest" doesn't go well with my current R code, I suppose I may have to define this in advance? I'll post all my code here to be as your reference...

library(grid)
library(gridExtra)
library(gtable)
library(limma)
library(edgeR)
library(topGO)
library(BiocManager)
library(ALL)
data(ALL)
data(geneList)
library(package = affyLib, character.only = TRUE)
count <- read.table("DEG_all.txt", sep = "\t", header = T, row.names = 1)
count <- as.matrix(count)

countlist <- as.list(NULL)

geneID2GO <- readMappings("jgiGO.txt")
GO2geneID <- inverseList(geneID2GO)
geneID2GO <- inverseList(GO2geneID)
geneNames <- names(geneID2GO)
project <- "A_B"
compare <- c("A","B")
#template
listcsv <- paste(project,"_comp.csv")
counts <- count[,compare]
dir.create(project)
setwd(project)


write.table(counts,file=listcsv,sep=",",col.names=T,row.names=T,quote=F)


group <- factor(c("A", "B"))
d <- DGEList(counts = counts, group = group)
d <- calcNormFactors(d)
#d <- estimateGLMCommonDisp(d, method = "deviance", robust = T, subset = NULL)
bcv <- 0.1
result <- exactTest(d,dispersion=bcv^2)


#DEG
table <- as.data.frame(topTags(result, n = nrow(count)))
is.DEG <- as.logical(table$FDR < 0.05)
DEG.names <- rownames(table)[is.DEG]
png("deg.png",width = 640, height = 480)
plotSmear(result, de.tags = DEG.names)
dev.off()

updeg <- subset(table, FDR <0.05)
updeg <- subset(updeg, logFC > 0)
uc <- sum(updeg$FDR < 0.05)
u <- paste(project,"up")
countlist[u] <- uc
updeg <- rownames(updeg)
write.table(updeg, "updeg.csv", quote=F, append=T,row.names=FALSE)

downdeg <- subset(table, FDR <0.05)
downdeg <- subset(downdeg, logFC < 0)
uc <- sum(downdeg$FDR < 0.05)
u <- paste(project,"down")
countlist[u] <- uc

downdeg <- rownames(downdeg)
write.table(downdeg, "downdeg.csv", quote=F, append=T,row.names=FALSE)
ADD REPLYlink modified 3 months ago by ATpoint31k • written 3 months ago by Keyouer0

Ok, yeah ResultFromQLFTest in your case would be just result, ignore how I called it. Lets also ignore that you are not estimating but just entering a dispersion estimate and therefore will get absolutely non-reliable results. Rest of the code looks ok (though I did not check every line). Is it working?

ADD REPLYlink written 3 months ago by ATpoint31k

Yes, it worked. Thank you so much. I entered a dispersion estimate because I only have one RNA-seq sample other than several...

ADD REPLYlink written 3 months ago by Keyouer0

Be aware that this is not reliable as the true dispersion could be much larger due to either biological differences between true replicates or technical issues during library prep, just saying.

ADD REPLYlink written 3 months ago by ATpoint31k

Yes, you are right. Thank you so much...

ADD REPLYlink written 3 months ago by Keyouer0

Could you please help one more thing? I want to define DEG as the count per million value >=2, foldchange>4, as I really cannot figure out how to do this...

ADD REPLYlink written 3 months ago by Keyouer0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1083 users visited in the last hour