Tuesday, November 20, 2012

The Sabermetric WARs

In 1977, Bill James sparked a revolution in baseball thinking that challenged what people have always believed about the game.  He started to gain a cult following from readers of his annual baseball abstracts, to the point where his work has effected the way every single team in baseball is run today.  In the process, it has sparked a major debate between old school and new school that has been a central point in many baseball discussions, coming to a major head in the Mike Trout/ Miguel Cabrera debate.
I know I've already mentioned this in my previous post, but baseball writers are idiots.  Not because they voted for Cabrera, as he was equally deserving as Trout.  Neither one winning a travesty, a joke, or even wrong.  It was their reasoning, which is rooted in the troglodyte old school thinking that was a complete joke and shows their stupidity.
However, after reading the complaints about Trout not getting MVP, it is clear many of the people who rely on Sabermetrics aren't any smarter than those who rely on antiquated and misleading stats such as batting average, RBIs, wins, etc.
There has been a long misconception about the true purpose of sabermetrics on both sides of the debate.  Statistics have been deeply rooted into the game of baseball since it's inception.  What Bill James (and the many that have come after him) did was simply figure out which statistics are most reliable, and how to compile various data into new statistics that are useful to evaluate players.
The problem is, there are people within the sabermetric community that simply don't have the mental capacity to take all things, statistical or otherwise, into consideration, and rely on what they perceive to be a be-all/ end-all statistic and something that can end all arguments.  At the moment, that statistic is wins above replacement, or WAR.
The original statistic was total player rating (TPR) back in the 1980's, in which each event by a hitter, baserunner, or fielder was assigned a value based on the probability of how often whatever happened led to a run, or decreased the chance of a run.  However, TPR was so heavily flawed the Bill James came along with equivalent average (EqA) in the mid 90's, a stat that combined walks, steals, total bases, sacrifices into a percentage stat in which the league average was the same as batting average.  As sabermetrics became more mainstream in the late 90's/ early 2000's, it was simplified for those new to it and combine the two most valuable traditional statistics (OBP and SLG) into one statistic, on base plus slugging (OPS), which later lost favor to advanced OPS (OPS+), taking into account ballpark factors and league averages.  Since OPS and OPS+ are more or less an arbitrary statistics since OPS is based upon simply adding two separate statistics together, they went again looking for something less arbitrary and came up with runs created per 27 outs (RC27) which uses the same stats as EqA but converts it into a number similar to ERA so that hitters and pitchers could be more compared.
When the trend became looking for value, win shares, in which every team is allocated a certain amount of points (three for every win) to their players, became the most popular statistic.  As that was too heavily reliant on team performance, they turned to value over replacement player (VORP) which was the number of runs added to the team compared to some scrub that can easily be found via free agency or the minor leagues.  VORP eventually gave way to wins above replacement (WAR), which is where we are now.
 WAR, and many other sabermetric statistics, are inherently flawed mainly because it attempts to statistically evaluate things that you can't put into numbers and can only evaluate subjectively. Among them:
*Accurately determine how many games a player won for their team.  There are way too many intangibles to evaluate that.
*Combining offense, defense, and pitching numbers into the same statistic.  These are completely different aspects of the game that cannot be quantified within the same stat.  You might as well create a WAR formula for basketball so you can compare Lebron James to Trout or Cabrera.
*Statistically altering a players offensive value compared to others in their position.  Obviously, a middle infielder's bat is more valuable than a first baseman with similar numbers, but again, there's no true way of knowing
*Even though it's not part of WAR, ballpark factors are taken into account for other notable formuals (such as OPS+ and ERA+).  For starters, every ballpark affects every ballplayer differently. Secondly, the ballpark factors can randomly fluctuate from year to year, so it's not uncommon for a hitter to have a better year than the year before but do worse in stats that take into account ballpark because the other hitters on his team did better at home while the pitchers did better on the road than the previous year.  Third, even if every ballpark affected every player the same, and the numbers didn't fluctuate, there is still no way to accurately calculate how much better someone Buster Posey would have done outside of AT&T Park.
According to WAR, Robinson Cano was better than Miguel Cabrera this year.  Considering both players had the same exact number of games plate and plate appearances, it's pretty easy to compare them.  Cabrera beat Cano pretty easily in nearly every category, except for having just two more strikeouts, eight fewer doubles (although Cabrera had 11 more homers).  Yet, because Cano was a second baseman, he had a higher offensive WAR (oWAR) by nearly a full point, and a better overall WAR by over a point.  And Mike Trout may have been the better player from May through July, but you can't tell me that he came close to Cabrera the last two months of the year, especially in September.  Yet, WAR will try to tell you.  Any stat that has Mike Trout's mediocre September being better (1.8 WAR) than Miguel Cabrera's (1.5 WAR) cannot be taken seriously.
And if going by WAR, do you know whose had the highest among position players in the American League since 2009, leading the league twice?  Ben Zobrist.  Not Miguel Cabrera.  Not Robinson Cano.  Not Josh Hamilton.  Ben Zobrist.  This offensive line sure looks like a superstar to me.
And don't even get me started on wins probably added (WPA).  WPA, which has been around in various incarnations since long before Bill James came long, but has received more and more attention lately in wins probability added (WPA), calculates the difference between the team's likelihood of winning before and after each of the player's at bat.  All I have to say is this: if a player hits a home run in a game, and his team wins by one run, it doesn't matter what the score was, what inning, or how many outs there were at the time.  At the end of the game, that home run ultimately counted the same and won the team the game.
Trying to say that Trout was a better hitter than Cabrera, citing oWAR and WPA as the reasons why, is equally as dumb as saying Cabrera deserved MVP because he won three arbitrary statistics or that the Tigers made the playoffs with one less win than the Angels.
Overall, the debate has pretty much devolved into two groups of idiots.  On one hand, you have the old school people who  are two stubborn to admit that what they've been taught and grown up believing is wrong.  They still rely on batting average, which basically means they believe that a single, double, triple, and home run are all worth the same and walks don't mean anything.  They rely on stats like runs, RBIs, and win-loss record, which are heavily reliant on how their team performs.  They ignore factors that show whether or not a pitcher is as good as their ERA indicated.
Then you have those that embrace the newer stats, but lack the mental ability to look at a player's entire stat line and form their own conclusions, so they have to rely on a formula they probably don't even understand and use it as the definitive stat as if it's the ultimate answer to end all arguments, be it TPR, eQA, OPS, win shares, OPS+/ ERA+, RC27, VORP, or WAR.    You can't even debate with these people, as they are convinced that everything they think is fact since WAR (or whatever else they use) say so.
All these newer stats are good for baseball, as long as you know how to utilize them.  There will never be a definitive formula, no matter how hard people try to come up with one.  Instead of trying to combine them all into one formula, look at everything individually.  Look at all the numbers on the stat line, and yes, include everything from the antiquated statistics to the pointlessly convoluted formulas.  Take into account intangibles that cannot be statistically evaluated, be it park factors, how they compare to others in their position, etc.  Then come up with your own conclusion. That's what Bill James has always done with his evaluations and predictions.

No comments:

Post a Comment