I usually first build a couple of dendrograms with hierarchical clustering using different methods e.g. complete linkage and Ward's linkage to get an idea of the structures present in the data. The problem is that k-means will give you the clusters you requested no matter whether there's structure in the data or not. Once you know there's structure you can either cut the tree or use k-means with the number of clusters found in the dendrogram. Also with 100-dimensional vectors, you should probably not use Euclidean distance if the data is noisy.
Finally, there are also a few clustering algorithms that don't require the number of clusters as input e.g. DBSCAN (dbscan package in R) although they often require setting other parameters which are not necessarily easier to estimate.
(IIRC). The $\mu$s represent cluster centroids, and the minimization forces them to be as few as possible, while minimizing the distance from the data points $x_i$ to the corresponding centroid. I can see if I can find a reference, if it's of interest.