73.6% of all Statistics are Made Up

Posted on Feb 14, 2010 | 140 comments


How to Interpret Analyst Reports

pinocchio

The headlines in the media are filled with that latest stats.  Stats sell.  The stats are often quoted from the latest reports.  People then parrot them around like they’re fact when most of them are complete bullsh*t.  People throw them around at cocktail parties.  Often when they do I throw out my favorite statistic:  73.6% of  all statistics are made up.  I say it deadpanned.  Often I’ll get some people look at me like, “really?”  “It’s true. Nielsen just released the number last month.”

No.  It’s irony.

Or as Mark Twain popularized the quote most attributed to the Prime Minister of Great Britain, Benjamin Disraeli, “there are three kinds of lies: lies, damn lies and statistics.”  The quote is meant to highlight the deceiving but persuasive power of numbers.

So, where is this all coming from, Mark?  What are you on about?  Anyone with a great deal of experience in dealing with numbers knows to be careful about the seduction of them.  I’m writing this post to make sure you’re all on that same playing field.

Here’s how I learned my lesson:

I started my life as a consultant.  Fortunately I was mostly a technology consultant, which meant that I coded computers, designed databases and planned system integration projects.  OK, yes.  It was originally COBOL and DB2 – so what? ;-) But for my sins I got an MBA and did “strategy” consulting.  One of our core tasks was “market analysis,” which consistent of: market sizing, market forecasts, competitive analysis and then instructing customers on which direction to take.

It’s strange to me to think that customers with years of experience would ever listen to twenty-something smarties from great MBA’s who have never worked in your industry before – but that’s a different story.  Numbers are important.  I’d rather make decisions with uncertain numbers than no numbers.  But you have to understand how to interpret your numbers.

In 1999 I was in Japan doing a strategy project for the board of directors of Sony.  We were looking at all sorts of strategic decisions that Sony was considering, which required analysis and data on broadband networks, Internet portas and mobile handsets/networks.  I was leading the analysis with a team of 14 people: 12 Japanese, 1 German and 1 Turk.  I was the only one whose Japanese was limited to just a sushi menu.

I was in the midst of sizing the mobile handset markets in 3 regions: US, Europe and Asia.  I had reports from Gartner Group, Yankee Group, IDC, Goldman Sachs, Morgan Stanley and a couple of others.  I had to read each report, synthesis it and then come up with our best estimate of the markets going forward.  In data analysis you want to look for “primary” research, which means the person who initially gathered the data.

But all of the data projections were so different so I decided to call some of the research companies and ask how they derived their data.  I got the analyst who wrote one of the reports on the phone and asked how he got his projections.  He must have been about 24.  He said, literally, I sh*t you not, “well, my report was due and I didn’t have much time.  My boss told me to look at the growth rate average over the past 3 years an increase it by 2% because mobile penetration is increasing.”  There you go.  As scientific as that.

I called another agency.  They were more scientific.  They had interviewed telecom operators, handset manufacturers and corporate buyers.  They had come up with a CAGR (compounded annual growth rate) that was 3% higher that the other report, which in a few years makes a huge difference.  I grilled the analyst a bit. I said, “So you interviewed the people to get a plausible story line and then just did a simple estimation of the numbers going forward?”

“Yes. Pretty much”

Me, sarcastically, “And you had to show higher growth because nobody buys reports that just show that next year the same thing is going to happen that happened last year?”  Her, “um, basically.”

“For real?” “Well, yeah, we know it’s going to grow faster but nobody can be sure by how much.”  Me, “And I suppose you don’t have a degree in econometrics or statistics?”  Her, “No.”

I know it sounds like I’m making this sh*t up but I’m not.  I told this story to every consultant I knew at the time.  Nobody was surprised.  I wish it ended there.

The problem of amplification:

The problem got worse as the data flowed out to the “bulge bracket” investment banks.  They, too, were staffed with super smart twenty somethings.  But these people went to slightly better schools (Harvard, Stanford, Wharton, University of Chicago) and got slightly better grades.  They took the data from the analysts.  So did the super bright consultants at McKinsey, Bain and BCG.  We all took that data as the basis for our reports.

Then the data got amplified.  The bankers and consultants weren’t paid to do too much primary research.  So they took 3 reports, read them, put them into their own spreadsheet, made fancier graphs, had professional PowerPoint departments make killer pages and then at the bottom of the graph they typed, “Research Company Data and Consulting Company Analysis” (fill in brand names) or some derivative.  But you couldn’t just publish exactly what Gartner Group had said so these reports ended up slightly amplified in message.

Even more so with journalists.  I’m not picking on them.  They were as hoodwinked as everybody was.  They got the data feed either from the research company or from the investment bank.  And if anybody can’t publish something saying “just in, next year looks like a repeat of last year” it’s a newspaper.  So you end up with superlative amplification.  “Mobile penetration set to double next year reaching all time highs,” “venture capital market set to implode next year – more than 70% of firms may disappear” or “drug use in California growing at an alarming rate.”  We buy headlines.  Unless it’s a major publication there’s no time to fact check data in a report.  And even then …

The problem of skewing results:

Amplification is one thing.  It’s taking flawed data and making it more extreme.  But what worries me much more is skewed data.  It is very common for firms (from small ones to prestigious ones) to take data and use it conveniently to make the point that that want to make.  I have seen this so many times I consider it routine, which is why I question ALL data that I read.

How is it skewed?  There are so many ways to present data to tell the story you want that I can’t even list every way data is skewed.  Here are some examples:

- You ask a small sample set so that data isn’t statistically significant.  This is often naivete rather than malicious

- You ask a group that is not unbiased.  For example, you ask a group of prisoners what they think of the penal system, you ask college students what they think about the drinking age or you ask a group of your existing customers what they think about your product rather than people who cancelled their subscription.  This type of statistical error is known as “selective bias.”

- Also common, you look at a large data set of questions asked about consumer preferences.  You pick out the answers that support your findings and leave out the ones that don’t support it from your report.  This is an “error of omission.”

- You change the specific words asked in the survey such that you subtly change the meaning for the person reading your conclusions.  But subtle changes in words can totally change the way that the reader interprets the results.

- Also common is that the survey itself asks questions in a way that leads the responder to a specific answer.

- There are malicious data such as on Yelp where you might have a competitor that types in bad results on your survey to bring you down or maliciously positive like on the Salesforce.com AppExchange where you get your friends to rate your app 5 out of 5 so you can drive your score up.

That doesn’t happen? “I’m shocked, shocked to find that gambling is going on here.”  We all know it happens.  As my MBA statistics professor used to say, “seek disconfirming evidence.”  That always stuck with me.

Believing your own hype:

And this data subtly sinks into the psyche of your company.  It becomes folklore.  13% of GDP is construction – the largest industry.  40% of costs are labor, 40% are materials and 20% are overheads.  23% of all costs are inefficient.  18% of all errors come from people using the wrong documents. 0.8 hours are spent every day by workers searching for documents.

It’s important to quantify the value of your product or service.  I encourage it.

You’ll do your best to market the benefits ethically while still emphasizing your strong points.  Every investment banker I know is “number 1″ in something.  They just define their category tightly enough that they win it.  And then they market the F out of that result.  That’s OK.  With no numbers as proof points few people will buy your products.

Obviously try to derive data that is as accurate as possible.  And be careful that you don’t spin the numbers for so long and so hard that you can’t separate out marketing estimates from reality.  Continually seek the truth in the form of better customer surveys, more insightful market analyses and more accurate ROI calculations.  And be careful not to believe your own hype.  It can happen.  Being the number one investment bank in a greatly reduced data set shouldn’t stop you from wanting to broaden the definition of “number 1″ next year.

Here’s how to interpret data:

In the end make sure you’re suspicious of all data.  Ask yourself the obvious questions:

- who did the primary research on this analysis?

- who paid them? Nobody does this stuff free.  You’re either paid up front “sponsored research” or you’re paid on the back-end in terms of clients buying research reports.

- what motives might these people have had?

- who was in the sample set? how big was it? was it inclusive enough?

- and the important thing about data for me … I ingest it religiously.  I use it as one source of figuring out my version of the truth.  And then I triangulate.  I look for more sources if I want a truer picture.  I always try to think to myself, “what would the opposing side of this data analysis use to argue its weaknesses?”

Statistics aren’t evil.  They’re just a bit like the weather – hard to really predict.

And as they say about economists and weathermen – they’re the only two jobs you can keep while being wrong nearly 100% of the time ;-)

  • http://twitter.com/phil_hendrix Phil Hendrix

    Mark – I agree w/ your observations in “73.6% of all Statistics are Made Up.” Analysts appear to do very little primary research and when they do it's pretty crude. I'm often shocked at how uncritically accepting clients and consultants are of analysts “conclusions.” You might enjoy some of the lessons we've learned in investigating new products. See “New-to-Market Products – Estimating Potential, Sizing Markets,” http://bit.ly/aJQ49H
    Dr. Phil Hendrix, immr

  • http://www.ryanborn.net ryanborn

    You should go to 60 Minutes with this one Mark – Scott Pelly would love this…

  • http://stickyslides.blogspot.com Jan Schultink

    I am still slightly more optimistic than you are. Yes, I agree that it I saw many cases where it was convenient to close the branch of the issue tree with a “tick” as soon as a piece of data was available that confirmed the intuitive hypothesis.

    This often happens in the early phases of a project. Still, you often came full circle towards the end of the project, when multiple streams of analysis come together. It is at these moments when you start questioning the market research.

    There was a golden rule in my time at McKinsey:

    Rule 1:”If it looks wrong, it is probably wrong”
    Rule 2: “Rule #1 applies 90% of the time, in the 10% other cases, you got to a major piece of insight”

    It takes courage to admit at the end of the project that your data assumptions might have been wrong.

  • http://stickyslides.blogspot.com Jan Schultink

    Personally, I would not admit to data manipulation in my days as a consultant. But you are right that big consulting firms do hardly do any primary research themselves.

  • http://sigma-hk.com Mark Westling

    I'll suggest another book: “Statistics As Principled Argument”, by Robert Abelson (http://www.amazon.com/Statistics-Principled-Arg…). It's pretty technical with a strong academic slant but it's the only book I've seen that likens statistics to rhetoric: it helps you argue a point. If you don't understand how to use it, your point may be easily refuted, or you may miss the opportunity to make a stronger point. You've got to love a stats book with chapters “Styles of Rhetoric” and “On Suspecting Fishiness”.

    Little of this will help you analyze the news headline that product X will be a $Y billion market in Z years, but it will help you persuade the smart, skeptical board member that you're thinking the way you ought to think.

  • http://www.davidblerner.com davidblerner

    I can't wait for a similar post by you on bias and B.S. in the daily news we ingest…

  • http://www.nabbr.com MattMinoff

    Too many I-Bankers and consultants start with the answer instead of the question.

  • http://giffconstable.com giffc

    thanks for publicly talking about this, Mark. My response is too long for here, so wrote it on my blog.

    It's a game that has been going on for a long time, but I'm not sure what can be done about it. Humans have a need for false certainty, and there will always be self-proclaimed gurus ready to fulfill that demand.

  • http://bothsidesofthetable.com msuster

    Worse still – Goldman Sachs was an investor in my company. You can imagine how glowing the report on my company was. Those Chinese walls have had more scrutiny post the first dot com bust, but still ….

    re: Mary Meeker – I have to admit her reports are some of the ones I look forward to the most.

  • http://bothsidesofthetable.com msuster

    Sorry. I wasn't asking you to admit to manipulating data. I was pointing out that at a big a prestigious consultancy you must have seen some case managers selectively choose data. Not the same as “manipulation” and not implying that you personally did it. Well, I can speak for me. I saw people from my firm selectively choose data ALL THE TIME! I also saw people do it that used to work at your firm ;-)

  • http://bothsidesofthetable.com msuster

    Thanks for the reco.

  • http://bothsidesofthetable.com msuster

    Oh, believe you me it's coming one day!

  • http://bothsidesofthetable.com msuster

    Giff, great post. Anyone still reading these comments should click through and read Giff's blog post. You went one step further whereas I was saving that for a separate post –> the problems of how the press machine works. The majority of the press print whatever you spoon feed them as long as you're credible and give them data. There are exceptions, of course. But they are the exception and not the rule. But I'll cover that in a separate post.

    I also love that you talked about Payola with analysts. I also plan to cover that in a future post.

    Thank you.

  • davidkpark

    Instead of “73.6% of all Statistics are Made Up” how about “73.6% are Idiots and Incapable of Using Statistics.” I would like to separate the field of statistics from the idiots who use them.

  • Ann

    You're right, Mark: why do large companies listen to young, inexperienced people? I never have. Maybe it's wrong of me, but I pay little attention to advice or suggestion from someone unless they have years of direct experience in the subject field. There are a huge amount of people out there who talk from mere imagination and concepts, not from real experience.

  • http://twitter.com/markmusolino Mark Musolino

    Good discussion. Only thing I'll add, and I'll do so at the risk of stating the obvious, is that biased interpretation and reporting of data is certainly not unique to business. It's a problem in academia too. In my early grad years (PhD in Bioengineering) I gave offenders the benefit of the doubt, but eventually realized that despite the best of intentions (think “knowledge for the sake of knowledge”), many folks had become so attached to their own ideas that it was very very difficult to “seek disconfirming evidence”. Intense pressure to generate positive research outcomes to secure new grants didn't help either. I've resigned myself to hating the game not the player, and hope that discussions like these can spotlight the issue and move us in the right direction.

  • http://twitter.com/aumg Gregg Borodaty

    Maybe this fits under amplification, but one of the bigger issues I've always seen with statistics is extrapolation of early data, like adoption rates, or in the health sector, disease discovery. People like to take the data from the first 30 days, 3 months, etc., develop a statistical equation based off the early data points, and then quote how 70% of the world's population will adopt this technology in 5 years, or be sick in 5 years. It's statistically valid, but realistic – no.

    One other point that you hit on in the post – one of my better MBA professors told us to always get our data, and especially the analysis, from multiple sources, as even hard data can be manipulated to make it say what you want. I don't know how many times that one small lesson has saved me the last 15 years.

  • WeslyM3000

    “Statistics aren’t evil. They’re just a bit like the weather – hard to really predict.” that's a classic.
    I have to use that. More-so you can almost always find a statistic to prove a bias.

    Wesly

  • Simon Gornick

    Many thanks for your reply, Mark.

    Our biz model is very different to most if not all of the 70 or so OVPs out there, so I prefer to rely on an intuitive belief in our core offering + our simplicity + of course, a very low price point. It helps that we're the “little guy OVP”. Rather than seeking enterprise clients, we're looking to self starting end users (affiliate marketers + SMBs + Bloggers) looking to add revenue streams at very limited cost. To us that fits with the new economic reality.

    As for presentations, I totally agree. Investors are more likely to respond to realism and responsibility than a pitch driven by dubious numbers and forecasts.

    Our aim is to seek investors when we have 'real' numbers, rather than a wishlist. Building a startup from zero to a respectable revenue flow with almost no money at the front end would seem to be the best proving ground for instilling medium and long term investor confidence.

    But we shall see!

    Best

    Simon

  • http://twitter.com/BBillingsley Brian Billingsley

    Mark – Great post…you took the huge “elephant in the room” head on. I forwarded the post to our entire team – as it is a great reminder to be both suspicious of all data, and to also cite & manipulate data with integrity.

  • fauxfauxpas

    Good Stuff! Don't know who said it, but… Statistics are like a bikini – what they reveal is exciting but what they conceal is vital…

  • Aviah Laor

    :D Excellent post. Problem people judge analysis by the beauty of the charts, and Excel really pushed the envelope here.
    Here is one of the funniest takes on the issue (from “yes minister”):
    http://www.youtube.com/watch?v=2yhN1IDLQjo

  • http://twitter.com/phil_hendrix Phil Hendrix

    Mark – I agree w/ your observations in “73.6% of all Statistics are Made Up.” Analysts appear to do very little primary research and when they do it's pretty crude. I'm often shocked at how uncritically accepting clients and consultants are of analysts “conclusions.” You might enjoy some of the lessons we've learned in investigating new products. See “New-to-Market Products – Estimating Potential, Sizing Markets,” http://bit.ly/aJQ49H
    Dr. Phil Hendrix, immr

  • http://www.ryanborn.net ryanborn

    You should go to 60 Minutes with this one Mark – Scott Pelly would love this…

  • http://stickyslides.blogspot.com Jan Schultink

    I am still slightly more optimistic than you are. Yes, I agree that it I saw many cases where it was convenient to close the branch of the issue tree with a “tick” as soon as a piece of data was available that confirmed the intuitive hypothesis.

    This often happens in the early phases of a project. Still, you often came full circle towards the end of the project, when multiple streams of analysis come together. It is at these moments when you start questioning the market research.

    There was a golden rule in my time at McKinsey:

    Rule 1:”If it looks wrong, it is probably wrong”
    Rule 2: “Rule #1 applies 90% of the time, in the 10% other cases, you got to a major piece of insight”

    It takes courage to admit at the end of the project that your data assumptions might have been wrong.

  • http://iterativepath.wordpress.com/ Rags Srinivasan

    Data collection and analysis suffers from all kinds of biases (sampling errors, selection bias, recency bias, etc) but that is a people problem rather than a tool problem. Statistics, the tool, has all the rigors built into it to help the analyst and decision maker evaluate the quality of data and the conclusions. Even with clean data and all the checks anyone can make broad predictions that are seemingly plausible (e.g., WSJ Ads to parents claims students who read WSJ are 76% more likely to have a higher GPA).

    It is not the statistics that is unpredictable. In the end it all comes down to an individual writing down the survey questionnaire or doing the excel model who is unpredictable.

    (I do want to point out that you used one sample to make a call on all analyst reports.)

  • http://www.nabbr.com MattMinoff

    Too many I-Bankers and consultants start with the answer instead of the question.

  • davidkpark

    Instead of “73.6% of all Statistics are Made Up” how about “73.6% are Idiots and Incapable of Using Statistics.” I would like to separate the field of statistics from the idiots who use them ; )

  • http://bothsidesofthetable.com msuster

    Yeah, I thought a lot about that when I was writing the post but steered clear of it because a) it seemed to political and b) it wasn't an experience I could tell first hand. But I was thinking a lot about the Caufield study on the link between MMR and Autism. I have been meaning to pick up the book “Denialism” and read it.

  • http://bothsidesofthetable.com msuster

    Yeah, I thought a lot about that when I was writing the post but steered clear of it because a) it seemed to political and b) it wasn't an experience I could tell first hand. But I was thinking a lot about the Caufield study on the link between MMR and Autism. I have been meaning to pick up the book “Denialism” and read it.

  • http://bothsidesofthetable.com msuster

    Yeah, I think that second sentence is the thesis of my post. People have biases and use data to support them. Period.

  • http://bothsidesofthetable.com msuster

    Yeah, I think that second sentence is the thesis of my post. People have biases and use data to support them. Period.

  • http://bothsidesofthetable.com msuster

    Thanks, Brian. Appreciate the kind feedback.

  • http://bothsidesofthetable.com msuster

    Thanks, Brian. Appreciate the kind feedback.

  • http://bothsidesofthetable.com msuster

    ;-)

  • http://bothsidesofthetable.com msuster

    ;-)

  • http://bothsidesofthetable.com msuster

    That's a fair comment. I have a lot of respect for statisticians and that didn't come through in my post. My point was actually that many tech analysts that I've encountered don't have proper statistics backgrounds. Maybe that's changed in the last few years. Doubt it.

  • http://bothsidesofthetable.com msuster

    Agree on both points. On multiple sources – always. Triangulation helps you understand the market sizing and begins to break down your understanding of why the numbers aren't “true.”

  • http://bothsidesofthetable.com msuster

    Um …

  • http://bothsidesofthetable.com msuster

    Hey Rags. Listen, that is exactly my point. Many of the analysts I spoke with and the consultants I have encountered are not expert in field surveys, statistical analysis or data interpretation. Yet they present results as authoritative.

    I appreciate your willingness to “call me out” on my conclusions not being “statistically significant;” however, I would point out that my post (like all my posts) is my subjective point-of-view rather than quantitative fact. I have never represented my opinions as anything other than that so I'm sorry if I gave you or other readers the impression otherwise. As it happens, my story only talks about one instance but – DUDE – it has happened so many times I cringe. Ask around in the industry. I think you'll find it is pretty pervasive.

  • http://twitter.com/aumg Gregg Borodaty

    Maybe this fits under amplification, but one of the bigger issues I've always seen with statistics is extrapolation of early data, like adoption rates, or in the health sector, disease discovery. People like to take the data from the first 30 days, 3 months, etc., develop a statistical equation based off the early data points, and then quote how 70% of the world's population will adopt this technology in 5 years, or be sick in 5 years. It's statistically valid, but realistic – no.

    One other point that you hit on in the post – one of my better MBA professors told us to always get our data, and especially the analysis, from multiple sources, as even hard data can be manipulated to make it say what you want. I don't know how many times that one small lesson has saved me the last 15 years.

  • http://twitter.com/markmusolino Mark Musolino

    Your mention of “political” made me realize that, for the record, I should make it clear that I'm not saying that folks are purposely, or even knowingly, biasing data — rather, it's just natural to develop a preferential bias for a point of view in which you are heavily vested.

  • davidkpark

    The world would be a better place if everyone had an MA in statistics with at least a course on causal inference from a Rubin-Rosenbaum perspective.

  • http://bothsidesofthetable.com msuster

    Oh, yeah. I know. I think the same is true in tech. People often do it inadvertently. I just think that when you talk about biology, disease, food, weather patterns and similar subjects things get a bit too political so I avoided that in this post. Thanks for clarifying.

  • http://bothsidesofthetable.com msuster

    LOL

  • fauxfauxpas

    Good Stuff! Don't know who said it, but… Statistics are like a bikini – what they reveal is exciting but what they conceal is vital…

  • http://iterativepath.wordpress.com/ Rags Srinivasan

    Data collection and analysis suffers from all kinds of biases (sampling errors, selection bias, recency bias, etc) but that is a people problem rather than a tool problem. Statistics, the tool, has all the rigors built into it to help the analyst and decision maker evaluate the quality of data and the conclusions. Even with clean data and all the checks anyone can make broad predictions that are seemingly plausible (e.g., WSJ Ads to parents claims students who read WSJ are 76% more likely to have a higher GPA).

    It is not the statistics that is unpredictable. In the end it all comes down to an individual writing down the survey questionnaire or doing the excel model who is unpredictable.

    (I do want to point out that you used one sample to make a call on all analyst reports.)

  • plus8star

    … and 61.4% of statistics are actually made up *on the spot* !

    One good reference on how to twist data right and left is the never-out-of-date “How to Lie With Statistics” by Darell Huff (ed. 1954)

    Free examples:
    http://www.stats.ox.ac.uk/~konis/talks/HtLwS.pdf

    Amazon:
    http://www.amazon.com/How-Lie-Statistics-Darrel

  • http://giangbiscan.com Giang Biscan

    David, thanks for sharing. Wouldn't you agree though that the output is just another data point for triangulation? The quality of it depends on the inputs, ie. market cap/valuation, discounted rate, market share – which themselves are just reported data – ie. 73.6% made up :).

    I have to agree with Mark that there is definitely value in doing the analysis though, it gives you a better judgement.

    This was a spot on post. Anyone who ever pulls reports from several reputable sources for any particular industry will run into the same problem that Mark described here. Then two things come into play: “technical” judgement (smart enough to examine the data?) & “moral” judgement (honest enough to not massage it to fit?).

  • http://bothsidesofthetable.com msuster

    Yeah, I thought a lot about that when I was writing the post but steered clear of it because a) it seemed to political and b) it wasn't an experience I could tell first hand. But I was thinking a lot about the Caufield study on the link between MMR and Autism. I have been meaning to pick up the book “Denialism” and read it.