Hello,
I am writing to ask for your help regarding the DEG analysis of several Parkinson's cohorts.
I have already done several transcriptomics and DEG analyses with other diseases. However, when I analyse the GSE18838 cohort, I get the following:
Down 0
NotSig 22010
Up 1
And in other cohorts of the same tissue and Parkinson's vs Control patients from other experiments, I also get NS in all of them. I understand that it is a possible result but I find it strange when there is a published paper that has also done the DEG analysis with the same cohort (GSE18838) and the DEGs come out.
So I wanted to ask you if you could check the code in case you see something strange or if I have done something wrong. Here are all the steps I have done with GSE18838 cohort
**1. I analyse the metadata**
GSE18838 <- getGEO('GSE18838')
length(GSE18838)
class(GSE18838[[1]])
GSE18838 <- GSE18838[[1]]
varLabels(GSE18838) #see labels
info <- pData(GSE18838) ## print the sample information
**2. Processing raw files and associating metadata information**
getGEOSuppFiles("GSE18838")
untar("GSE18838/GSE18838_RAW.tar", exdir = 'data/')
celfilelist <- list.files(path = "data/", pattern = ".CEL", full.names = TRUE)
gse <- read.celfiles(filenames = celfilelist, phenoData=phenoData(GSE18838))
**3. Review and normalize data**
summary(exprs(gse))
boxplot(exprs(gse), outline=FALSE, ylab = "Expression level", xlab = "Samples", col = "red")
abline(h=median(exprs(gse)),col="blue")
plotDensities(exprs(gse), legend = FALSE)
hist(exprs(gse), col="blue", border="white", breaks=100, main = "Histogram of raw expression")
normalized.data <- oligo::rma(gse)
boxplot(normalized.data, outline=FALSE, ylab = "Expression level", xlab = "Samples", col = "red")
abline(h=median(exprs(normalized.data)),col="blue")
plotDensities(exprs(normalized.data), legend = FALSE)
Here you can see how the data ranges from a minimum value of 20 to a maximum of approximately 60,000. And the median does not coincide between the boxes, so we normalise by RMA. So far everything seems to make sense.
**4. DEG analysis:**
pData(normalized.data) #We are interested in making disease vs control -> we select the disease state:ch1 tag to make the layout.
experimental.design <- model.matrix( ~ 0 + (normalized.data)[['disease state:ch1']])
colnames(experimental.design) <- levels(as.factor(normalized.data[['disease state:ch1']]))
colnames(experimental.design) <- c("Healthy", "PD")
contrast_matrix <- makeContrasts(PD - Healthy, levels=experimental.design)
contrast_matrix
linear.fit <- lmFit(normalized.data,experimental.design)
contrast.linear.fit <- contrasts.fit(linear.fit,contrasts=contrast_matrix)
contrast.results <- eBayes(contrast.linear.fit)
summary(decideTests(contrast.results,lfc=0)) #lfc=0 to see all genes with p.value < 0.05
The results that come out according to summary:
PD - Healthy
Down 0
NotSig 22010
Up 1
And the volcanoplot looks like this:
And I have applied this same coding to three other full blood Parkinson's cohorts doing a disease vs. control comparison. Is it possible that I have something wrong or that something strange is going on?
According to this paper:
Xing N, Dong Z, Wu Q, Zhang Y, Kan P, Han Y, Cheng X, Wang Y, Zhang B. Identification of ferroptosis related biomarkers and immune infiltration in Parkinson's disease by integrated bioinformatic analysis. BMC Med Genomics. 2023 Mar 14;16(1):55. doi: 10.1186/s12920-023-01481-3. PMID: 36918862; PMCID: PMC10012699.
SDRs should be released.
Thank you very much for your help.
Looking at this briefly, it is small sample size and you do not correct for any potential confounders. Use PCA or MDS to explore the data. Are there outliers, batch effects, anything that needs correction? Or are there simply do DEGs? It's possible.
I have checked the quality of the arrays using PCA and Boxplot (images attached) and some of the samples may deviate a little but perhaps not too much.
PCA
Boxplot
Maybe there is no DEG and that is a possible result but I find it strange because what I get does not agree with other analyses of other fellow researchers such as the paper by Xing et al. 2023, which also analyses the same spring with a result of 399 DEGs. Is it possible that this was a mistake on the part of the researchers?
Wait, so the magnitudes of the PCs are in the thousands? That's not normal for RNASeq.
It's a microarryay experiment, not RNASeq
Would Parkinson's be expected to have an effect on blood cells? Neurons, sure, but blood cells?
This is true. However, there are a number of studies that have done bioinformatics analysis in whole blood to see if there is a possibility of finding biomarkers.It makes sense that are no DEGs. The advantage is that at the diagnostic level it is less invasive. But it's strange because these paper show several cohorts with more o less DEGs. And yet, using the same packages (limma, p-value adjusted with FDR), I don't get the same results. So, it makes me doubt if I am doing it well.