I would recommend to do log-transformation of the TPM/RSEM dataset. PCA has a hidden assumption of normality. PCA finds the coordinate system such that it can maximize the variance between the points. This achieved using orthogonal principal components. In case of multivariate Gaussian distribution (for example: microarray dataset), orthogonal components implies that there is zero correlation between the components. However, it is not true for dataset with poission or negative binomial distributions (like RNA-Seq counts, tpm, rpkm). Also, RNA-Seq datasets are very skewed and since, PCA is very sensitive to outliers, it is not recommended to do PCA on these datasets. Instead, do a log transformation and then plot PCA. If you are not interested in doing log transformation, then use cmdscale function for MDS plots.
Update: Code for PCA plots
Suppose dat is your RPKM/TPM dataset. Make a genotype and/or condition vector.
genotype = c("KO1", "KO1", "WT1", "WT1","KO1", "WT1")
logTransformed.dat = log2(dat+ 1)
pcs = prcomp(t(logTransformed.dat), center = TRUE)
percentVar = round(((pcs$sdev) ^ 2 / sum((pcs$sdev) ^ 2)* 100), 2)
## PCA Plot
ggplot(as.data.frame(pcs$x), aes(PC1,PC2), environment = environment()) +
xlab(makeLab(percentVar,1)) + ylab(makeLab(percentVar,2)) + ggtitle(title) +
geom_point(size = 8, aes(colour = genotypes)) +
theme(legend.text = element_text(size = 16, face = "bold"),
legend.title = element_text(size = 16, colour = "black", face = "bold"),
plot.title = element_text(size = 0, face ="bold"),
axis.title = element_text(size = 18, face = "bold"),
axis.text.x = element_text(size = 16, face = "bold", color = "black"),
axis.text.y = element_text(size = 16, face = "bold", color = "black"),
plot.margin = unit(c(0.5,0.5,0.5,0.5), "cm"))
For Batch Effects, check if the samples are clustering together or not or is it clustering based on batches (if the batches is known). Check what does principal component 1 and 2 tells you about the dataset.