The main difference between t-SNE (or other manifold learning methods) and PCA is that t-SNE tries to deconvolute relationships between neighbors in high-dimensional data.
A classic example is the "swiss roll". To put the difference in layman's terms: t-SNE attempts to understand the underlying structure of the swiss roll. It does this by prioritizing neighboring points. PCA doesn't get what's going on - it doesn't see that the points are actually a line that's been rolled up.
This PCA sucks (it thinks yellow is close to blue when in fact they are far away):
In contrast, see how t-SNE seems to understand what's going on with this 'S'?
Just a couple of comments... Neither tSNE or PCA are clustering methods even if in practice you can use them to see if/how your data form clusters. tSNE works downstream to PCA since it first computes the first n principal components and then maps these n dimensions to a 2D space. The original paper on tSNE is relatively accessible and if I remember correctly it has some discussion on PCA vs tSNE. Also, this post on tSNE is quite good, although not really about tSNE vs PCA.