Question

patient-derived iPS differentiated neurons RNA-seq analysis, DESeq2, few DEGs when using q value (<0.05), could I use p (<0.05) instead to get the DEGs?

0

Entering edit mode

2.6 years ago

Lu.Lu • 0

Hi,

I'm now doing the RNA-seq analysis on patient and control derived iPS-differentiated neurons (7 control-neurons, 6 fxs-neurons). I input my data into DESeqDataSet, and filtered genes < 10, then I did the DESeq, and got the results, the results shows I only get 8 DEGs when the setting was |log2FoldChange| > 0.5, padj < 0.05. I got 1080 DEGs when the setting was |log2FoldChange| > 0.5, p < 0.05.

I also checked the FMR1 gene expression (the protein expression of FMR1 was very low in patients' neurons, already validated by my Western Blot experiment), there was a significant difference between control-neuron and fxs-neuron. so that means my RNA-seq is right?

I also tried the GO analysis using the 1080 DEGs, I could get some enriched results that we expected.

The PCA and sample distance analysis of my data is not good, the control groups and patient groups are messed up.

so my questions are as follows:

Does the patient iPS study always could not separate the control and disease groups very well because of the human genetic background variations?
Could I use p value to get my DEGs and do the downstream analysis?

Could somebody help me figure this out? thanks so much!

Here is my script for DESeq2 analysis:

# normalized data
dds <- DESeqDataSet(gse, design = ~ type)
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
dds$type <- factor(dds$type, levels = c("control","fxs"))
dds <- DESeq(dds)
res <- results(dds)
res <- res[!is.na(res$log2FoldChange),]

#write DEG list q value
res1 <- res[!is.na(res$padj),]
res1 <- data.frame(res1, stringsAsFactors = FALSE, check.names = FALSE)
write.csv(res1, 'all genes q.csv')
res1 <- res1[order(res1$padj, res1$log2FoldChange, decreasing = c(FALSE, TRUE)), ]
res1[which(res1$log2FoldChange >= 0.5 & res1$padj < 0.05),'sig'] <- 'up'
res1[which(res1$log2FoldChange <= -0.5 & res1$padj < 0.05),'sig'] <- 'down'
res1[which(abs(res1$log2FoldChange) <= 0.5 | res1$padj >= 0.05),'sig'] <- 'none'
res1_select <- subset(res1, sig %in% c('up', 'down'))
write.csv(res1_select, file = 'fxs to control.select q.csv')
res1_up <- subset(res1, sig == 'up')
res1_down <- subset(res1, sig == 'down')
write.csv(res1_up, file = 'fxs to control.DESeq2.up q.csv')
write.csv(res1_down, file = 'fxs to control.DESeq2.down q.csv')

#write DEG list p value
res2 <- res[!is.na(res$pvalue),]
res2 <- data.frame(res2, stringsAsFactors = FALSE, check.names = FALSE)
write.csv(res2, 'all genes p.csv')
res2 <- res2[order(res2$pvalue, res2$log2FoldChange, decreasing = c(FALSE, TRUE)), ]
res2[which(res2$log2FoldChange >= 0.5 & res2$pvalue < 0.05),'sig'] <- 'up'
res2[which(res2$log2FoldChange <= -0.5 & res2$pvalue < 0.05),'sig'] <- 'down'
res2[which(abs(res2$log2FoldChange) <= 0.5 | res2$pvalue >= 0.05),'sig'] <- 'none'
res2_select <- subset(res2, sig %in% c('up', 'down'))
write.csv(res2_select, file = 'fxs to control.select p.csv')
res2_up <- subset(res2, sig == 'up')
res2_down <- subset(res2, sig == 'down')
write.csv(res2_up, file = 'fxs to control.DESeq2.up p.csv')
write.csv(res2_down, file = 'fxs to control.DESeq2.down p.csv')

Best,
Lu

p-value q-value RNA-seq DEG • 1.9k views

ADD COMMENT • link updated 2.6 years ago by Ram 45k • written 2.6 years ago by Lu.Lu • 0

1

Entering edit mode

1) I guess you're studying Fragile X. FMR1 encodes for FMRP which has regulatory activities at synapses. In effect you should be seeing differential expression of transcripts regulated by FMRP. So you can try to cluster with just genes active in synapses to show your RNAseq is effective. You can also use calculate DE genes using this gene set.

2) if you're doing multiple statistical tests you should you FDR.

ADD REPLY • link 2.6 years ago by barslmn ★ 2.4k

0

Entering edit mode

Hi Barslmn,

Thanks so much for your reply! yes, I'm studying FXS using patient-derived iPS. Your suggestions are awesome! I will cluster with genes active in synapses to do the analysis!

And yes, I understand that for the rna-seq analysis, multiple statistical tests are used, so FDR should be used, but I only can get few DEGs, and some papers also use the P value to get the DEGs list. And if I think I will get fewer DEGs or no DEGs if I use the synapses gene set and FDR.

Thanks so much!

ADD REPLY • link 2.6 years ago by Lu.Lu • 0

0

Entering edit mode

Hi Barslmn,

Another question, do you know how to extract the normalized counts of synaptic-related genes from normalized counts (vsd data from DESeq2)? thanks so much!

ADD REPLY • link 2.6 years ago by Lu.Lu • 0

0

Entering edit mode

I don't know unfortunately. You can ask it as another question I guess. I would like to see what others would answer.

ADD REPLY • link 2.6 years ago by barslmn ★ 2.4k

0

Entering edit mode

I finally figured it out, I get the synaptic-related genes list firstly, extract the genes from my normalized data, then plot the PCA. But unfortunately, the control and FXS groups could not separated. Here is my script:

library(ggplot2) library(FactoMineR) library(factoextra) setwd(“ ”） gene <- read.csv("DESeq2normalizedvsdSYMBOL.csv",stringsAsFactors=FALSE) fmrpsy <- read.csv ("syngo_genes.csv",stringsAsFactors=FALSE) row.names(gene) <- gene$SYMBOL #extract the column SYMBOL fmrpsy2 <- gene[rownames(gene) %in% fmrpsy$hgnc_symbol,] fmrpsy3 <- fmrpsy2[,2:14] fmrpsy4 <- t(fmrpsy3) gene.pca <- PCA(fmrpsy4, graph = F) plot(gene.pca) pca_sample <- data.frame(gene.pca$ind$coord[ ,1:2]) group <- read.delim('coldata.txt', row.names = NULL, sep = '\t', check.names = FALSE) rownames(group) <- rownames(pca_sample) pca_sample <- cbind(pca_sample, group) pca_sample$samples <- rownames(pca_sample) ggplot(data = pca_sample, aes(x = Dim.1, y = Dim.2)) + geom_point(aes(color = group), size = 3) + scale_color_manual(values = c('orange', 'purple')) +
theme(panel.grid = element_blank(), panel.background = element_rect(color = 'black', fill = 'transparent'), legend.key = element_rect(fill = 'transparent')) + labs(x = paste('PCA1:', pca_eig1, '%'), y = paste('PCA2:', pca_eig2, '%'), color = '') ```

ADD REPLY • link 2.6 years ago by Lu.Lu • 0

0

Entering edit mode

If it is patient derived then you probably have a lot of heterogeneity, and with that 7 vs 6 is simply underpowered. That is not unexpected with human data, usually you need many more samples that get decent DE results.

ADD REPLY • link 2.6 years ago by ATpoint 88k

0

Entering edit mode

Hi ATpoint,

Thanks so much! your are right, patient-derived samples have a lot of heterogeneity, and more samples could get more decent DE results. but for my iPS study, 7 VS 6 is already a lot.

Thanks again!

ADD REPLY • link 2.6 years ago by Lu.Lu • 0