September 2, 2014

10 Baseball Writing Resources I Can’t Do Without

I am interested in the work habits of people that work in sports journalism. I want to know their process. If you are a sports journalist and ever agree to have lunch with me, I will probably bother you with questions in that vein.

So I figure I ought to share my process for writing about baseball, or at least the tools I use to make my incoherent points.

Bookmarks

Baseball-Reference — The gold standard for career baseball statistics. The site is incredibly deep yet still loads quickly. The ability to click to total statistics from multiple games and seasons is a Godsend. And the Play Index is a joy, from simply messing around to incisive database searching. I love it all. So do you.

FanGraphs — The silver standard. I use it largely for WAR and wRC+, two great metrics, plus the projections they provide. FanGraphs is sneaky deep and almost comparable to Baseball-Reference if you know how to use it correctly. And if I asked you which player has the highest walk rate over the last 8 seasons, would you believe Jack Cust?

Baseball Prospectus — BP offers one of the subscriptions that is absolutely worth it for baseball writers. You quickly figure out why so many BP writers end up being hired by Major League teams — you will be instantly impressed the research and presentation of data within the articles. My favorite feature: the injury history on player cards, available even to non-subscribers.

MLB Depth Charts — Jason Martinez provides a top-notch source of rosters, transactions and depth charts for all 30 teams. It is certainly useful for fantasy owners, and indispensable for quickly seeing the lineups and rotations of unfamiliar teams.

Brooks Baseball — If you want to know a pitcher, you have to see what he throws. Short of actually watching the pitcher, the best thing you can do is look at his Brooks Baseball page. You will see his pitch types, velocity, pitch outcomes and so much more. It’s amazing this is all out there for free. Thank you, Dan Brooks.

Flickr Creative Commons — If you run a non-profit site, Creative Commons photos are indispensable for giving the site some visual pop. It’s amazing how many kind people (looking at you, Keith Allison) take great sports photos and allow them to be used for free. Or, in my case, mediocre photos.

Tools

Google Chrome — My browser of choice makes it even easier to do searches on players and teams. Chrome allows you to start typing in a website, hit Tab, then search on that website. [Update: As Shotgun Spratling pointed out, adding this is not automatic. Go to Chrome’s Settings and “Manage search engines…” Scroll to the bottom and add Baseball-Reference.com, plus other sites like weather.com, fangraphs.com, espn.com, etc.]I don’t need to go to the Baseball-Reference main site to pull up Adam Wainwright’s page. Just hit Command-T-B-A-Tab-W-A-I-N-W-R-I-G-H-T-Enter. Once you get in the habit, you’ll never go back.

Microsoft Excel or Google Docs Spreadsheet — If you’re not paying for Microsoft Office, Google Docs allows you to do most of the formulas Excel is known for. Professional statisticians may need more than Docs, but it’s perfect for me to make a list or tabulate an OPS.

Magic Recs — If you use Twitter, you should follow Magic Recs. The account direct messages you when it notices multiple people you follow talking about a certain topic or retweeting a certain tweet. It’s not perfect, but Magic Recs has a high hit rate for notifying me about important Pirates and MLB news, or just a cool nugget I should know about.

Buffer — Once you write something great, it deserves to be shared! Buffer allows you to schedule sharing of your content on Twitter, Facebook, LinkedIn and Google Plus. If you don’t want to set a specific time, just throw the link into Buffer, and the site will put in a time to send it to the world. Easy and free — my two favorite words.

Tagged Advanced Metrics, Baseball, BlogForAYear, Stats
Leave a comment

August 18, 2014

Want to Give a Team Stat More Punch? Rank It

I don’t mean to pick on people, but I do need a recent example of kinda-useless-stat deployment.

So, Gene Collier. You’re a very good writer and you’ve always been kind to me. Your writing is funny, and as anyone who reads my work can attest, it is very difficult to be funny in print.

I’m afraid you must be a mark in this case.

#Pirates 42-19 when they score first, 22-41 when they don't.

— Gene Collier (@genecollier) August 18, 2014

This stat received wide exposure in being tweeted to Collier’s 8,600 followers and retweeted to the Pittsburgh Post-Gazette‘s 54,000-plus followers.

At first, this seems like a statistic of note. Look how big a difference striking first (or being struck against first) can make! Over a full season, it’s the difference between the Pirates being a 112-win team and being a 57-win team, between being an all-time great and being hide-the-children horrible.

The Pirates lose a lot when they trail. So what? (RJ Schmidt/Creative Commons)

However you must realize — of course this is the case! This is not something unique to the Pirates. If MLB mandated that every Pirates opponent start the game up 1-0 with a runner on base (this would be the scenario for most non-homer opening leads), the Bucs would only expect to win 36 percent of their games. So naturally, they’re bad when they fall behind in a contest.

Now, I may give Collier some benefit of the doubt. Perhaps he was only trying to illustrate this point — in baseball, if you fall behind, you lose very often.

I have seen these kinds of stats too many times, though. You know the ones: the Mudville 9 lose a lot when they trail after seven innings, the Charlestown Chiefs win a lot when Reggie Dunlop scores a goal. The Pittsburgh Pisces lose a lot when they get out-rebounded. You get the idea.

Give a Ranking

Nothing is particularly wrong with these kinds of stats. They’re interesting trivia. It may make a viewer say “Oh wow. The Globetrotters are 340-0 when they convert five straight dunks. That’s impressive.” Not a bad thing.

However, we can do better. It is very easy to give your team-based stat more impact by adding a simple piece: including a league-wide ranking. Here are some real numbers that carry a little more ‘oomph’ when they have a ranking (all true as of the beginning of Monday):

The Seattle Mariners are 39-28 against teams above .500 this season… the best mark in the Majors.
The Washington Redskins’ average starting field position is at their own 25-yard line… the worst mark in the NFL.
The Colorado Avalanche are 28-4-8 in one-goal games, best in the NHL.
The Miami Heat are converting 56 percent of their two-point field goals, by far the best mark in the league.

For most of these odd team-based stats, the average fan may not know if the number is really that good. Yes, we know a .400 batting average, four-goal game and out-rebounding your opponent are all positives.

But some stats can be confusing without a little added context. And the easiest context to give these kinds of stats is a 1-to-30 (or 1-to-32 in the NFL) ranking of where the team stands in that area. If you or the team communications manager took enough time to research this frivolous little number, the least you can do is do two minutes more work and better serve the fan.

Personally, a total of 99.4 percent of my readers love this blog, best in all of WordPress.

Tagged BlogForAYear, Context, Silly Stats, Sports Analytics, Sports Trivia, Stats
2 Comments

August 13, 2014

Knowing Advanced Metrics: The Four Kinds of Player Stats

All stats are not equal. Some of the more ignorant opponents of the sabermetrics or fancy-stats revolutions tend to characterize advanced stats like the obscure numbers Twins broadcaster Wally Holland pulls out in the movie Little Big League: “Lou, by the way, has hit .416 lifetime versus Hanley in the month of September in even years, so that certainly bodes well for this at-bat!”

That’s a stat, sure. But it doesn’t bode well for the at-bat, nor is it useful whatsoever beyond illustrating that variance is cuh-raaaaazy.

Proper understanding of sports statistics and analytics means understanding that there are different categories of stats, and media members often mislead you if you’re not paying attention.

So what are these categories? In Day 2 of the 365-day BlogForAYear project, I try to parse it out.

1. Trivia

Think about Little Big League or kind of weird player facts you see on baseball video boards. Real example from last night at PNC Park: “Travis Snider has gone 8-for-21 (.381) with two home runs in his 10 games played on Mondays this season.” That stat is obviously trivial; you will never see Pirates manager Clint Hurdle explain his lineup card for next Monday’s game by saying, “Well we trust Snider to put up great numbers on Mondays. The guy never gets bummed out that the weekend is over.”

When it’s useful:

It is fine to toss out notes of trivia, especially during television and radio broadcasts of games. Sports are entertainment. They are many other things, but they are entertainment, and finding little nuggets within the numbers adds to the fun of it. Jayson Stark traffics in the strange-but-awesome stats that pop up in baseball, and it’s a very fun way to look at the game.

When it’s harmful:

Evgeni Nabokov, author of “Lolita.” (clydeorama/Creative Commons)

TV broadcasts very often present trivia stats as if they were evaluative or trend indicators. For example, you might hear during an NHL game this season: “Evgeni Nabokov has great career numbers against Columbus: 20-5-3, .932 save percentage, 1.79 goals against average, better than he has against any other team. He really seems to have the Blue Jackets’ number, doesn’t he?”

(Note: Fake example in that I’ve never heard this said, but the numbers are real.)

The issue here is not one of small sample size per se. That’s 29 games of NHL action to contend with, and lord knows we draw judgments on goalies around Christmas when they are about 29 starts into their season.

Instead, consider the context of the sample: most of these games come from (A) when Nabokov was a better goalie and a Vezina candidate, (B) when the Columbus Blue Jackets were largely locked in the Western Conference basement with no sunlight and everyone put up good numbers against them, and most importantly (C) when the Blue Jackets players and Nabokov’s teammates were completely different individuals than we see today.

These are the trouble spots: the stats that sound like they are indicative of what we will see in tonight’s Lightning-Jackets game, but are really just frivolous or nothing more than “a neat little fact.” Now, I’m not opposed to frivolity; I have more than 52,000 tweets. But fans, and especially sports gamblers, must be wary of broadcasters presenting trivia that could be interpreted as a more substantive stat.

2. Story Stats

These are the box score stats. They show up in the newspaper or the online game recap to tell you how the game was won.

“[Geno] Smith, responsible for 11 turnovers over the first four games, played mistake-free and threw three touchdown passes while completing 16-of-20 passes for 199 yards in the first road victory of his young career.” — Jets-Falcons recap from Monday, Oct. 7

When it’s useful:

There is absolutely a story in that game recap. Geno Smith put up poor stats in the previous games but played better to lead the Jets to victory. Beautiful! Perfect for a game recap. As long as you realize the stats represent “this is how Geno Smith led the Jets to a win” and not “this is why Geno Smith is a good quarterback who is turning things around,” you’re doing it right.

Scoring two goals in a game, going 8-for-13 in a series with 6 RBI, averaging 28 points per game during this postseason… all examples of stats that tell the story of a player having success and being a part of his team’s wins. The numbers construct their own little narrative, and that’s useful.

How did the Lakers win last night? “Oh, Nick Young just went off. 41 points, 14-of-23 from the field, 6-for-11 from beyond the arc. He was insane!” Cool, got it.

Robots aren’t taking your job, sports recap writers, but they’ll try. Robots never sleep.

When it’s harmful:

It only took me until Day 2 of 365 for me to use the xkcd comic.

The problem comes in the post-game shows and the newspaper columns — TV analysts and writers take a one-game performance or stat line and use it to judge a player.

Worst of these are the narratives of “clutch,” and these seem to pop up in every sport. Make a couple late mid-range buckets? Clutch shooter! A pair of game-winning singles? Clutch hitter! Lead a few 4th-quarter comebacks? Clutch quarterback! We as a nation had an honest-to-God national conversation about Tim Tebow because of the flimsy narrative device of “clutch.” Derek Jeter’s brand is built on being “Captain Clutch.” It’s why he has this list of gorgeous ladies notched into his bedpost and you do not.

For years, the line in baseball was that there is no such thing as clutch. Nate Silver wrote in 2008 that “clutch hitting ability exists,” but admits the data proving it may be better defined as “smart situational hitting” than some sort of mental strength. I haven’t looked too far into the arguments in other sports, but there’s a reason NBA savant Zach Lowe writes about “clutch” in quotation marks.

Yet there is a reason that “clutch” and other story-stats-as-narrative-tools propagate.

“There is a strain of journalism as hero worship, a strain that asks us to believe that sports are tests of character, that those who come through at key moments of the game have reached down deep inside themselves and found the strength and courage to succeed. I don’t want to get into that.” — Bill James, The Hardball Times Annual 2008

The upshot of James’ look at whether a clutch hitter exists or not? “We don’t know.” You should use the same kind of skepticism when a media member presents a story stat as a referendum on a player’s ability in crunch time.

3. Evaluative Stats

When they’re analyzed the right way, advanced metrics can be proper evaluations of a player’s skill level. In the absence of a scouting report, these numbers can indicate that a player is great, above-average, average, below-average or poor. This is analytics.

When it’s useful:

I have this photo from a Nate Silver lecture saved in my phone. It comes from his must-read book The Signal and the Noise.

Break it down. Advanced metrics are strong evaluation tools when they have quantity. The concept of “puck luck” in hockey stems from the idea that a player scoring a goal (or being denied one) is defined largely by unexpected bounces and turns of the puck. It’s not all about skill.

The effects of puck luck can be smoothed out with a large enough sample. Take Jarome Iginla’s stats from this five-year sample.

Year 		Goals/Game	Points/Game
05-06		.43		 .82
06-07		.56		1.34
07-08		.61		1.20
08-09		.43		1.09
09-10		.43		 .92

Iginla’s true talent in that five-year period is not .82 points per game and it’s not .61 points per game. But when you pull it all together, you have a player you can expect to score about 1.05 points and .48 goals per game. And wouldn’t you know it, in the 2010-11 season, Iginla averaged 1.05 points and .52 goals per game. Take a large sample and your data almost always becomes more reliable.

For quality and variety, you want to make sure the player’s stats are being put up:

against both good and bad opponents (strength of schedule metrics are quite common these days)
in offensive-friendly and defensive-friendly venues (this mostly applies to baseball and football)
with different groups of teammates if possible (especially in basketball, hockey and soccer, where the ball and puck flow through many players).

Helpfully, you don’t need a degree in applied mathematics to synthesize all these factors. Guys and gals who do possess such degrees have dumped the numbers into a science machine to spit out a wonderful invention: projections!

I include projections in evaluative stats category because they are based entirely on the evaluative stats and factors mentioned above. Biff the Sabermetrician doesn’t have a Grays Sports Almanac; all he has is a database of what has happened in the past and some algorithms.

The NFL has KUBIAK projections. MLB has PECOTA and ZiPS and a bunch of others. The NHL has VUKOTA. The NBA has SCHOENE, the folks on Twitter tell me. They aren’t just for forecasts; use these projections as part of your evaluation of a player.

When it’s harmful:

Never! Advanced metrics are the best!

Well, mostly, players don’t want to hear about it. They don’t really care about their WAR or their Corsi or their DVOA. And in most cases, they don’t need to care. The players themselves are inadvertent data collectors in most cases. Yasiel Puig’s job is to hit the ball hard, not to worry about his BABIP. But his general manager should care very much about BABIP and all the other metrics when considering the value of a contract extension.

If you’re a baseball fan, you don’t need to understand or even subscribe to sabermetrics. You can totally enjoy the game without it, and people have been doing so for a century. It’s fine! But fans need to understand that general managers and baseball operations staff do subscribe and use advanced metrics to make decisions. If you want to criticize their moves, start reading up the evaluative stats or I will chastise you on Twitter. And I’m very good at it. That shirt looks stupid on you.

4. Trends

This last group of stats doesn’t fit too neatly into any of the other three categories. Anyone who has ever played pickup basketball knows the feeling of being “in the zone” like you can’t miss, or on the other side, feeling totally out of sorts. Therefore: trends!

When it’s useful:

A goalie maintaining a 140-minute shutout streak is kind of trivia and kind of a story, but it also indicates that he could be in a groove of goaltending, however much you want to put stock into how long the streak is likely to continue.

Pedro Alvarez presents an example of trends being useful. (Keith Allison/Creative Commons)

An opposite baseball example: third baseman Pedro Alvarez has committed 23 throwing errors this season (or one throwing error every four games), and no sane person watching his throws would regress those numbers or draw on a larger sample size and expect those error numbers to go down. He simply looks like a player who can’t make a throw from third base.

Just as we recognize slumps, we can see when a player looks better than he usually does. We now theorize that the “hot hand” in basketball really does exist, per a study by three Harvard graduates. When the smarties controlled for the increasingly difficult shots taken by the “hot hand” player (you can read why in the study), a hot shooter feels “from 1.2 to 2.4 percentage points in increased likelihood of making a shot.”

It’s not much, but it’s not nothing.

When it’s harmful:

During my first draft of this post, I included only three kinds of player stats but eventually felt trends were just barely worthy enough to get their own category.

However, we must be careful not to overrate the effects of a hot hand or a hot bat. The Thunder wouldn’t give the last shot to Jeremy Lamb over Kevin Durant just because Lamb made his three previous shots. A hot hand is not an unstoppable hand.

That fact doesn’t stop writers and broadcasters from using too many small-sample-size stats to draw large conclusions. Always be on the lookout for numbers that have arbitrary endpoints like “in the last 63 games” or “since May 5.” Chances are the media member is cutting off at the perfect spot on a game log in order to make his or her point. Those aren’t trends, they’re cherry-picking.

A final note on trends: they are usually not as good a signal of future performance as projections are. Mitchel Lichtman studied the reliability of season stats compared to projections, and found that using projections can fight our recency bias. “Until we get into the last month or two of the season, season-to-date stats provide virtually no useful information once we have a credible projection for a player.”

Billy Butler’s having a rough year? He’ll probably come back from it. Nelson Cruz is hitting over his head? He’ll probably come back down to earth. You don’t know much about advanced metrics? Keep reading my blog, I’ll try to help.

Tagged Advanced Metrics, BlogForAYear, Sabrmetrics, Sports Analytics, Stats
1 Comment

James Santelli

Journalist

Tag Archives: Stats

10 Baseball Writing Resources I Can’t Do Without

Bookmarks

Tools

Want to Give a Team Stat More Punch? Rank It

Give a Ranking

Knowing Advanced Metrics: The Four Kinds of Player Stats

1. Trivia

When it’s useful:

When it’s harmful:

2. Story Stats

When it’s useful:

When it’s harmful:

3. Evaluative Stats

When it’s useful:

When it’s harmful:

4. Trends

When it’s useful: