Question: What can be done to separate the healthy and patient samples in PCA plot and Distance matrix
0
gravatar for swarajgaikwad15
16 months ago by
Tezpur University, Assam, India
swarajgaikwad150 wrote:

Hello, I am doing RNA seq analysis to obtain Differential expression genes using DESeq2 for 13 patient and 6 Healthy donors. Before going for DESeq2 analysis, I am visualizing my samples by Distance matrix and PCA plot, using the following commands:

library("RColorBrewer")
sampleDistMatrix <- as.matrix(sampleDists)
rownames(sampleDistMatrix) <- paste(rld$condition, sep="-")
colnames(sampleDistMatrix) <- NULL
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix,
         clustering_distance_rows=sampleDists,
         clustering_distance_cols=sampleDists,
         col=colors)

Distance matrixDIstance matrix

plotPCA(rld, intgroup="condition")

PCA plot : PCA plot

Here I am not able to visualize both the sample group separately. Am I suppose to discard the samples, which can not be separated and how one can find this exact sample? Is there any function or solution to separate the samples without discarding it? Kindly reply. Any help in this regard will be highly appreciated

    sessionInfo() R version 3.5.2 (2018-12-20) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.1 LTS

Matrix products: default BLAS: /usr/lib/x8664-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x8664-linux-gnu/lapack/liblapack.so.3.7.1

locale: [1] LCCTYPE=enIN.UTF-8 LCNUMERIC=C
LCTIME=enIN.UTF-8 LCCOLLATE=enIN.UTF-8 [5] LCMONETARY=enIN.UTF-8 LCMESSAGES=enIN.UTF-8
LCPAPER=enIN.UTF-8 LCNAME=C [9] LCADDRESS=C LCTELEPHONE=C
LCMEASUREMENT=enIN.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] hexbin1.27.2 vsn3.50.0
pheatmap1.0.12 RColorBrewer1.1-2 [5] DESeq21.22.1 SummarizedExperiment1.12.0 DelayedArray0.8.0 BiocParallel1.16.5 [9] matrixStats0.54.0 Biobase2.42.0
GenomicRanges1.34.0 GenomeInfoDb1.18.1 [13] IRanges2.16.0 S4Vectors0.20.1
BiocGenerics_0.28.0

loaded via a namespace (and not attached): [1] bit640.9-7
splines3.5.2 Formula1.2-3 assertthat0.2.0
affy1.60.0 [6] BiocManager1.30.4 latticeExtra0.6-28 blob1.1.1 GenomeInfoDbData1.2.0 pillar1.3.1
[11] RSQLite2.1.1 backports1.1.3 lattice0.20-38
limma3.38.3 glue1.3.0 [16] digest0.6.18
XVector0.22.0 checkmate1.9.0 colorspace1.4-0
preprocessCore1.44.0 [21] htmltools0.3.6 Matrix1.2-15
plyr1.8.4 XML3.98-1.16 pkgconfig2.0.2
[26] genefilter1.64.0 zlibbioc1.28.0 purrr0.2.5
xtable1.8-3 scales1.0.0 [31] affyio1.52.0
htmlTable1.13.1 tibble2.0.1 annotate1.60.0
ggplot23.1.0 [36] nnet7.3-12 lazyeval0.2.1
survival2.43-3 magrittr1.5 crayon1.3.4
[41] memoise1.1.0 foreign0.8-70 tools3.5.2
data.table1.12.0 stringr1.3.1 [46] locfit1.5-9.1
munsell0.5.0 cluster2.0.7-1 AnnotationDbi1.44.0
bindrcpp0.2.2 [51] compiler3.5.2 rlang0.3.1
grid3.5.2 RCurl1.95-4.11 rstudioapi0.9.0
[56] htmlwidgets1.3 labeling0.3 bitops1.0-6
base64enc0.1-3 gtable0.2.0 [61] DBI1.0.0
R62.3.0 gridExtra2.3 knitr1.21
dplyr0.7.8 [66] bit1.1-14 bindr0.1.1
Hmisc4.1-1 stringi1.2.4 Rcpp1.0.0
[71] geneplotter1.60.0 rpart4.1-13 acepack1.4.1
tidyselect0.2.5 xfun_0.4
ADD COMMENTlink written 16 months ago by swarajgaikwad150
1

Before going for DESeq2 analysis

Does this mean you are directly using raw counts for clustering and PCA? If so, your results are not unusual. Also from the heatmap scale, I really think there is some sort of problem with normalization. You can use DESeq2 rlog/vst normalized counts for PCA.

ADD REPLYlink written 16 months ago by venu6.6k

Actually, I have transformed my raw counts using rlog and then I am visualizing my data in PCA plot and heatmap so that I could obtain proper Differential expressing genes.

ADD REPLYlink written 16 months ago by swarajgaikwad150

These plots that you have shown say virtually nothing about differentially expressed genes. These plots just show [mostly] unbiased / unsupervised relationships between your samples.

It looks like you may benefit from reading through the DESeq2 Vignette

By the way, in your PCA bi-plot, there is some evidence that 2 of your samples are outliers.

ADD REPLYlink modified 16 months ago • written 16 months ago by Kevin Blighe60k

Thank you for your reply, Kevin, I have read DESeq2 vignette but it didn't find anything about solving such kind of error, however, Again i'll read and find it out what can be done. But as per your experience, can you give me any suggestions for solving this problem? Thank you

ADD REPLYlink written 16 months ago by swarajgaikwad150

I see what you mean. You want to identify the samples in the PCA bi-plot so that you can remove them?

Please take a look at this reproducible code (using random data), which will perform PCA and generate a bi-plot with lables (just for QC-ing):

x <- matrix(rexp(200, rate=.1), ncol=20)
rownames(x) <- paste("gene", 1:nrow(x))
colnames(x) <- paste("sample", 1:ncol(x))

p <- prcomp(t(x))

proportionvariances <- ((p$sdev^2) / (sum(p$sdev^2)))*100

#Plots scatter plot for PC 1 and 2
plot(p$x,
  type="n",
  main="Principal components analysis bi-plot",
  xlab=paste("PC1, ", round(proportionvariances[1], 2), "%"),
  ylab=paste("PC2, ", round(proportionvariances[2], 2), "%"))
points(p$x, col="black", pch=16, cex=1) #add points
text(p$x, rownames(p$x), col="black", cex=1) #add labels

d

Your object is called rld, with the counts in this object being accessible via assay(rld). So, your code needs to be:

p <- prcomp(t(assay(rld)))

proportionvariances <- ((p$sdev^2) / (sum(p$sdev^2)))*100

#Plots scatter plot for PC 1 and 2
plot(p$x,
  type="n",
  main="Principal components analysis bi-plot",
  xlab=paste("PC1, ", round(proportionvariances[1], 2), "%"),
  ylab=paste("PC2, ", round(proportionvariances[2], 2), "%"))
points(p$x, col="black", pch=16, cex=1) #add points
text(p$x, rownames(p$x), col="black", cex=1) #add labels

The resulting bi-plot will differ from your original. This is because DESeq2's PCA function automatically removes a large proportion of variables prior to performing PCA, whereas, my code does not do this.

ADD REPLYlink modified 16 months ago • written 16 months ago by Kevin Blighe60k

How to add images to a Biostars post

ADD REPLYlink written 16 months ago by WouterDeCoster43k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 893 users visited in the last hour