# Plate Discipline Clustering- Part 1

Updated: Sep 2, 2020

I recently decided to run a cluster analysis of plate discipline metrics(from FanGraphs) for all qualified batters in 2018. The idea behind running a cluster analysis is to see which players have similar offensive approaches, as well as have similar contact abilities. It's my opinion that the better we understand a player's approach and contact abilities, we can pair that with batted ball metrics to see if a player could benefit from either a change in swing tendencies or direction and angles of their batted ball spray based on present power capabilities.

The first thing you have to do when doing a cluster analysis is to determine how many groups of clusters you will use. I used the K means clustering method which allows for some user interpretation when deciding how many clusters. By running an analysis of the between sum of squares and the total sum of squares you can determine how much variation of the data is taken out by grouping the data in the clusters. When choosing the number of clusters plotting the ratio between sum of squares and total sum of squares we should look for the "Shoulder" in the plot. I identified this at around 6 or 7 clusters but ultimately decided 7 because I know how much players can differ in their approach.

Another visualization you can use is plotting all relationships of the data columns and coloring each data point by cluster color. This kind of visualization allows the analyst an "eye test" of seeing potential groupings being accounted for. As you can see by the graphics below, with only two clusters there aren't many groupings of data, but with seven some clusters of data start to stand out.

So after assigning each cluster group to each player we can begin to analyze the clusters. As stated before, by clustering a hitters plate discipline metrics we can better feel for the hitters profile, and what adjustments they need to make to maximize their performance, or if the hitter should change his approach to plate discipline because his batted ball profile may allow him to have more success with a different approach in regards to plate discipline. The post that follow will include analysis of the cluster groups, as well as notes about certain players in them. Each cluster also has some of the players batted ball metrics but please not these data points were not included in the clustering, only the plate discipline metrics( O.Swing, Z.Swing, O.Contact, Z.Contact, Swing Strike %) The link to the entire data set is posted below. Check it out!

https://docs.google.com/spreadsheets/d/1m0ZNEFTBMI2QhyXeOWHnzcdjiQFhLGOq0rllwNSkR8Q/edit?usp=sharing