Differential expression analysis
1
0
Entering edit mode
3 months ago
Anupama • 0

Hi, I'm doing differential expression analysis by using DESeq2. I want to sort the top 100 upregulated and downregulated genes. So how can i identify the top 100 genes? does it based on log2fold change or adjusted p-value?

# get up and down reulated genes
up_OE<- res[which({res$log2FoldChange > 0 & res$padj < .05}),]

do_OE<- res[which({res$log2FoldChange < 0 & res$padj < .05}),]

#get top 100  genes
resOrderedDF <- as.data.frame(up_OE)[seq_len(100),]

write.csv(up_OE, file="top100.csv")


This is the code I'm using. is it correct?

DEGs • 166 views
0
Entering edit mode

R can index using logical vectors, you don't need the which. Do you want just the top 100 genes' symbols or the logFC etc details as well?

0
Entering edit mode
3 months ago

Hi!

In this case you could use a metric to rank genes, similar for a GSEA analysis, considering both LFC (sign) and padj values: In your results object add a new column to calculate the metric (LFCpadj)

my_res <- res(dds, lfc = lfc_threshold) #Get the results from the dds object using an lfc threshold
my_res <- as.data.frame(my_res) #Codificate as df the object
my_res$LFCpadj <- -log10(my_res$padj)*sign(my_res\$log2foldchange) #Add the column for the rank metric


Now, you must sort your data respect to this ranking metric. Genes over-expressed and highly significant will appear at the top of the sorted list. In contrast, genes sub-expressed and highly significant will appear at the bottom of the sorted list. Take a look at the the rank command to sort your list

Best regards!

Rodo