Differential expression analysis
1
0
Entering edit mode
3 months ago
Anupama • 0

Hi, I'm doing differential expression analysis by using DESeq2. I want to sort the top 100 upregulated and downregulated genes. So how can i identify the top 100 genes? does it based on log2fold change or adjusted p-value?

# get up and down reulated genes
up_OE<- res[which({res$log2FoldChange > 0 & res$padj < .05}),]

do_OE<- res[which({res$log2FoldChange < 0 & res$padj < .05}),]

#get top 100  genes
resOrderedDF <- as.data.frame(up_OE)[seq_len(100),]

write.csv(up_OE, file="top100.csv")

This is the code I'm using. is it correct?

DEGs • 166 views
ADD COMMENT
0
Entering edit mode

R can index using logical vectors, you don't need the which. Do you want just the top 100 genes' symbols or the logFC etc details as well?

ADD REPLY
0
Entering edit mode
3 months ago

Hi!

In this case you could use a metric to rank genes, similar for a GSEA analysis, considering both LFC (sign) and padj values: In your results object add a new column to calculate the metric (LFCpadj)

my_res <- res(dds, lfc = lfc_threshold) #Get the results from the dds object using an lfc threshold
my_res <- as.data.frame(my_res) #Codificate as df the object
my_res$LFCpadj <- -log10(my_res$padj)*sign(my_res$log2foldchange) #Add the column for the rank metric

Now, you must sort your data respect to this ranking metric. Genes over-expressed and highly significant will appear at the top of the sorted list. In contrast, genes sub-expressed and highly significant will appear at the bottom of the sorted list. Take a look at the the rank command to sort your list

Best regards!

Rodo

ADD COMMENT

Login before adding your answer.

Traffic: 2524 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6