Uncategorized

Lies Damned Lies & Statistics

Introduction

In recent years many golf tours, led by the PGA Tour, have been producing player performance statistics, in ever-growing volumes and (some might say arcane) breadth.

 These stats may well be helpful to players and coaches but are now devolving into an industry based around investment in golf outcomes via Fantasy sport or traditional betting. In my opinion, that’s insane! Well, perhaps insane is an inappropriate word; let’s just say silly.

Me, Annoyed

Statistics, its perpetrators and its proponents intensely annoy me at times; I’d like to explain why and it’s not just because statistics can be lies, as rightly categorised in the famous quote popularised by Mark Twain: “There are three kinds of lies: lies, damned lies, and statistics”.

It’s also because those glib (sometimes even paid) providers and quoters of stats (to justify their conclusions / tips) are clearly existing in some type of alternative universe where fiction has become fact!

Time Matters

I’m a management consultant and have been since the early 1990’s – the dawn of the internet age. My biggest problem in those pre-internet years was getting information; books, newspapers, wire services, television, radio, magazines, personal networks and so on needed to be continuously mined for information to help enable me to stay on top of what was happening in the world, in my country, in my clients’ areas of business and so on.

Getting data was key as it enabled data analysis and problem, solution, opportunity & threat identification.

Now, a short couple of decades later, guess what my biggest problem is? It’s too much information! Don’t get me wrong, I love the way the internet has enabled info availability and I could not function without it. However, my biggest challenge as a busy executive is how to best allocate and consume my time!

Time for: meetings, holidays, business analysis, personal relationships and family, strategising, staff, critical suppliers & partners, sleep, recreation and so forth. Each day only delivers 24 hours; that never changes. How those hours are used has become more challenging in the internet age and I attribute this to the sheer volume of information available. There is simply too much of it and filtering it is a daily challenge if one wishes to be successful while retaining sanity and leading some sort of balanced existence.

 Background

This leads me back to golfer performance statistics. When I started seriously betting on golf, 20 or so years ago, performance stats were minimal and I necessarily used as my guides only some among the following:

·         Form. Players’ placings (won, T10, MC, etc);

·         Course & Conditions. Players played well or badly on which types of course (links / parkland, long / short, windy / calm, etc) and;

·         Grass. Did players putt better or worse on bermuda, bent or poa greens, etc)

 Even compiling such basic data points was challenging and occupied many hundreds of hours of research. However, my process was: I accumulated my data, let it feed into my personal price assessments of golfers, worked out whether I thought bookmakers had priced a golfer wrongly and, if so, I bet on that golfer. By the way, I made money; every year.

Nowadays, so much more data are readily available and this should make golf analysis more accurate. But therein lies the crux of the problem! There is simply too much information available and I’m not qualified to interpret it; hardly anybody is.

 A Parable

Let me illustrate my point with a parable from a modern golf bettor:

“I’d been consistently losing money and I decided that: throwing darts at a field, backing the top 4 favourites in the market or the 3 players I like best were all dumb strategies. So I decided to adopt a scientific approach: 

I research the internet and establish that this week the weather forecast is for near perfect conditions and that the course:

·         Is relatively long at 7,500yds;

·         Has fairly fast bent grass putting greens;

·         Offers up generously wide fairways and;

·         Feeds inaccurate approaches into tightly mown runoff areas.

Research from past tournament previews and reviews suggests I should be looking for a long driver who putts well on bent grass greens and has good scrambling / around the greens ability.

By the way, have you noticed all those betting previews say players need to putt well? Duh!

In other words, I’m getting a wee bit stressed and already considering a return to my normal betting approach cos I don’t have time to invest in a lot of statistical analysis after having already spent time googling those previews. However, I persevere and, to assist my deliberations, I sign-up with a stats supplier that allows me to examine all stats for the players in this week’s field. Cool. Science. Focus.

Driving comes first as I pretty much know I need to back a bomber off the tee. Easy, right? Just look-up driving distance. The only thing I don’t know is if I should use driving distances on this course from past tournaments, but I can’t find those numbers so I just use generic driving distance stats; seems simple enough.

I don’t know that the data I’m reading are not each player’s performance data on all Tours or that the numbers are derived from the measurement of just a few drives and only on this Tour. I also don’t know if the measurement is how far the drive travels in the air or how far it travels, including rollout; I expect it’s the latter. This causes me a bit more concern as it seems to me that measuring distance only, without taking account of how soft or hard the fairways were, could be misleading. Anyway, I figure there must be enough data points in the stats, so I can rely on them; stats never lie, right? I shortlist the 20 longest drivers in the field. Progress.

Next comes getting those long drives onto the green; there are twelve Par4’s so I reckon I know what I need. I examine the Par4 yardages and find my 20 bombers should be approaching the greens from 100 to 150 yds on average, so I use some wonderful-looking stats that tell me how successful these guys have been when approaching from 125-150yds & 100-125yds. I’m slightly dubious that the stats were most likely derived from performance on all courses except this week’s course and on various fairway grasses with varying degrees of hardness, but never mind, push on. I rank my 20 guys in order of ‘approach ability’ and move on to ‘around the green’ cos I know this is also important.

Referring to the stats, I start to get hot under the collar. There’s a stat called SG:ATG, ‘strokes gained around the green’ and I thought that was perfect. But then I saw ‘Scrambling’ which sounded like the same thing, but the 20 players I’m looking at all have different rankings in these two seemingly identical categories. Confusing.

I pause and think a bit deeper and start wondering if these stats include bunkers, cos one of the previews I read said the course was heavily bunkered, so I figure sand might be an issue. I google and find an aerial overview of the course where I see a lot of bunkers on Par3’s but only a few around the greens on the longer holes. So, I stop worrying about bunker play and hit the stats again but, fuck me, in addition to ‘scrambling’ there’s ‘scrambling from the fringe’, ‘scrambling from 10-20yds, scrambling from sand’, ‘scrambling from the rough’ and more! So, I decide to just stick to ‘scrambling’ and rank my 20 guys for that.

Finally, on to putting. I know from my research that the greens are bent grass, so I go to the PGA Tour site and my stats supplier to get that stat; dammit, it doesn’t exist. That seems stupid in the extreme! Instead I find maybe 75 categories of putting statistics including some I think are just silly, like ‘average putting distance – GIR 3+ putts’. Anyway, another subscription service has putting average by grass type for the PGA Tour, perfect. I sign-up & rank my 20 target guys by their bentgrass putting ability. 

Now, I finally have everything sorted stats-wise and can focus on which of the 20 guys to bet on; the fun part. So, I start to review my findings. I have an uneasy feeling it shouldn’t be this easy but, what the hell, I’m using science instead of gut feel so my conclusions will be superior. Right?

Firstly, however, I re-read those tournament betting previews I’d previously researched and compare them with the outcome of the tournament last year. Shit, the guy who won only ranked 34th in driving distance! Gotta be a statistical anomaly, I figure. I check some earlier years and see some longer drivers who won, so with confidence at least partially restored, I push on.

My immediate problem is I have 3 sets of theoretically important rankings for my 20 long drivers; how much weight do I give to each? I don’t know. So, I decide to give them points in every category, based on their rankings and to stratify them that way. Simple math, easy. I’m happy; it took a long time to get there but I now have three selections to back and six for my Fantasy team. I feel better about my investments and next week will be much faster.

 But ….

In reviewing my three top selections, however, I recall reading that the wife of one of them gave birth a few weeks ago. I check to see when that baby was born and I find that my guy hasn’t played since. I wonder if the baby has changed his form dynamic; maybe he’s been missing out on sleep? Would he be anxious leaving the new family for the first time? Is his wife ok post-partum? Is he feeling on top of the world as a new father and going to play better?

I check the hundreds of stats categories for ‘performance of new fathers’ or ‘scoring after birth of first child’ or something, anything, but there is nothing to help me. Anyway, I figure, the guy’s a pro; he can block out babies and home life and just get back to work, right? No worries.

But then I start thinking about all the other players and what’s going on in their lives, which I know nothing about. Hell, a guy might have caught his girlfriend naked in the jacuzzi with a neighbour last week or had an argument with his partner about money or slept on a bad mattress in his motel room and woken with a stiff back. Though uneasy, I again comfort myself with those stats; they never lie and all those data points must combine to eliminate variance!

 Even More (Human) Data!

As part of my new statistical / factual approach to betting I’d started following about 230 players & caddies on Twitter, just in case there was something significant I’d otherwise miss. However, I soon started unfollowing them all cos my timeline got filled up with their crap every day and I couldn’t read it all; not enough time!

However, I noticed a player, one of my six selections, who recently withdrew from a tournament with a wrist? injury. I tried to research it properly but there was nothing much in the media except he pulled out after 12 holes. Anyway, I figured, he’s a pro, he won’t be playing this week if he’s not 100%. Right? I couldn’t find any stats on ‘player recovery from wrist injury’ and I didn’t know if there were other factors at play like: quotas of tournaments that must be played per season or the intricacies of medical exemptions; I haven’t got time to research such things, anyway!”

 More Stress

Right, enough stroytelling, back to my original discourse. My thoughts often dwell on the variegated factors at play in golf and how much they differ from other sports and thus how stats must have a lesser role to play in predicting golf performance.

For example, baseball is rife with stats but they’re more meaningful because the ball is thrown through the air by a pitcher, always standing the same distance from the batter, every time. No comparison with a golfer hitting off the ground, with 14 different ‘bats’, on a different course every week, aiming at a different target with every shot and in all sorts of weather conditions.

Or in football or tennis, where the ‘fields’ are all the same dimensions and there is little environmental impact on players from their surroundings as compared with a golfer who has to learn a new ‘field’ every week, in all types of weather conditions.

Further, all of the above stat focus presumes that past performance is a predictor of future performance, which is patently incorrect. We’re talking about humans here, not machines, and if past performance were an accurate predictor, there would be no place for Fantasy pools or bookmaker bets because the outcomes would be too predictable.

It’s the very unpredictability of sport that is part of its compelling interest to viewers / gamblers and golf is perhaps the least predictable of all and with the largest number of possible winners. For many of the reasons touched on above.

In Conclusion

I believe performance statistics can be a contributing element to the process of making golfer fantasy or betting selections, but that the total picture cannot be glibly summarised by stats because:

·         All the necessary stats do not exist;

·         Humans experience emotional & physical ups-and-downs and;

·         Past performance is not a precise predictor of future performance.

So, to all the persons out there who make confident selections, predictions, tips or line-ups that are derived solely from statistical analysis, I say this:

1. You’re not using all the relevant data and you’ll never have it all because it’ll never be publicly available, but that doesn’t matter because;

2. Your efforts will become meaningless as they’ll be able to be duplicated by a thousand computer models and will thus have no value-add.

Why do I say this? Because, as Aaron Kay so perfectly put it: “Believing the world distributes success & failure haphazardly provokes anxiety”

Whether they know it or not, this anxiety about haphazardness leads gamblers to seek comfort before investing and the ‘comfort-du-jour’ is performance statistics!

By all means, use stats as your prescription to help control your anxiety; it’s cheaper and less addictive than bourbon, Xanax or Valium, but don’t expect the investment outcome to be greatly improved!

If you’re a vendor in the market and you’re going to try to sell me on your stats model, I want value to be added. It might be who to fade or dodge, and why. It might be calculations of player upside relative to price (draft or bookmaker) or external factors like what’s going on in the player’s life off the course, etc.

But I want value for money so if you’re gonna glibly claim that three selected statistical categories is all it takes to determine how I’ll invest my money, then piss off and try to sucker-punch somebody else!

Finally, consider this. If you do have an edge, whatever your edge is, it is exceptionally valuable (to you) and should not be publicised or shared because that will automatically cause it to cease to exist and render it valueless!

© Copyright Mike J Miller: 8 November 2017