Us soccer player positions by number4/11/2024 ![]() ![]() I also went with 1200 minutes in a given season as a cutoff for inclusion in the analysis to avoid any weird patterns resulting from infrequent data for any important stats. It’s generally a good idea to normalize counting stats to per 90s here it’ll help us avoid clustering players into starters vs. The Goldilocks number that’s “just right” here will be somewhat subjective. ![]() However, we don’t want this percentage to be too high - if the roles never change, they aren’t sensitive enough to improvements, declines, position/tactical changes, etc. To test stability, we can examine two years of data - the 20 MLS seasons - and calculate what percent of players who were in the league both years end up in the same role. If most players end up in a different role from year-to-year, the labels probably aren’t going to be helpful moving forward. That said, it would also be great if our clusters line up roughly with positions - if we have a cluster that is half CBs and half CAMs, the role of the cluster will likely be hard to interpret as it probably won’t match our intuition for how players contribute to attacks.įinally, we want our clusters to be relatively stable. Again, if our clusters just tell us ‘yup, that’s a winger’, they’re not helpful. Second, our clusters should tell us more than just the player’s on-paper position. This also means that the roles we define would ideally match our intuition for the roles that most fans and pundits see as contributing to an attack. We should be able to look at each of our clusters and be able to tell what stats define the role of players in that cluster. How we go about making these decisions often depends on what we want our clusters to tell us.Ĭriteria for good attacking role clustersįirst and foremost, our clusters need to be interpretable. On top of that, it’s the analysts’ job to figure out how many clusters to pull out of a dataset and what these clusters actually tell us - neither of these tasks are always straightforward either. There are a lot of clustering methods and it’s often not clear which is the best for a given problem. In the real, not-so-simple world, clustering a dataset ends up being a series of ambiguous decisions. Quantifying defensive contributions with statistics is its own mountain of difficulties, so it’s easiest to start trying to do this with attacking stats. This would allow us to separate the Rossi’s from the Nani’s. That makes it perfect for what we’re trying to do here - we can, in theory, take a bunch of stats for every MLS outfield player and form groups that represent different roles or playstyles. In a simple world, a clustering analysis takes a bunch of data and sorts it into groups called clusters. With a clustering analysis that might be possible. But what if you hadn’t seen a player play yet? What if you’d like some objective way to define a player’s role beyond just their position? Wouldn’t it be nice if we had a data-driven way of determining a playstyle that we could use to give us an idea of how a winger.wings? We’ve made more specific terms, like inverted winger, to help describe the difference. ![]() Nani and Diego Rossi are both wingers on paper, but anyone who watches the two knows that they play very differently. One reason for this is that positions are also a lie. We all know this by now learning a team’s formation generally tells you very little about how they play. ![]()
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |