73.6% of all Statistics are Made Up

Posted on Feb 14, 2010 | 140 comments


How to Interpret Analyst Reports

pinocchio

The headlines in the media are filled with that latest stats.  Stats sell.  The stats are often quoted from the latest reports.  People then parrot them around like they’re fact when most of them are complete bullsh*t.  People throw them around at cocktail parties.  Often when they do I throw out my favorite statistic:  73.6% of  all statistics are made up.  I say it deadpanned.  Often I’ll get some people look at me like, “really?”  “It’s true. Nielsen just released the number last month.”

No.  It’s irony.

Or as Mark Twain popularized the quote most attributed to the Prime Minister of Great Britain, Benjamin Disraeli, “there are three kinds of lies: lies, damn lies and statistics.”  The quote is meant to highlight the deceiving but persuasive power of numbers.

So, where is this all coming from, Mark?  What are you on about?  Anyone with a great deal of experience in dealing with numbers knows to be careful about the seduction of them.  I’m writing this post to make sure you’re all on that same playing field.

Here’s how I learned my lesson:

I started my life as a consultant.  Fortunately I was mostly a technology consultant, which meant that I coded computers, designed databases and planned system integration projects.  OK, yes.  It was originally COBOL and DB2 – so what? ;-) But for my sins I got an MBA and did “strategy” consulting.  One of our core tasks was “market analysis,” which consistent of: market sizing, market forecasts, competitive analysis and then instructing customers on which direction to take.

It’s strange to me to think that customers with years of experience would ever listen to twenty-something smarties from great MBA’s who have never worked in your industry before – but that’s a different story.  Numbers are important.  I’d rather make decisions with uncertain numbers than no numbers.  But you have to understand how to interpret your numbers.

In 1999 I was in Japan doing a strategy project for the board of directors of Sony.  We were looking at all sorts of strategic decisions that Sony was considering, which required analysis and data on broadband networks, Internet portas and mobile handsets/networks.  I was leading the analysis with a team of 14 people: 12 Japanese, 1 German and 1 Turk.  I was the only one whose Japanese was limited to just a sushi menu.

I was in the midst of sizing the mobile handset markets in 3 regions: US, Europe and Asia.  I had reports from Gartner Group, Yankee Group, IDC, Goldman Sachs, Morgan Stanley and a couple of others.  I had to read each report, synthesis it and then come up with our best estimate of the markets going forward.  In data analysis you want to look for “primary” research, which means the person who initially gathered the data.

But all of the data projections were so different so I decided to call some of the research companies and ask how they derived their data.  I got the analyst who wrote one of the reports on the phone and asked how he got his projections.  He must have been about 24.  He said, literally, I sh*t you not, “well, my report was due and I didn’t have much time.  My boss told me to look at the growth rate average over the past 3 years an increase it by 2% because mobile penetration is increasing.”  There you go.  As scientific as that.

I called another agency.  They were more scientific.  They had interviewed telecom operators, handset manufacturers and corporate buyers.  They had come up with a CAGR (compounded annual growth rate) that was 3% higher that the other report, which in a few years makes a huge difference.  I grilled the analyst a bit. I said, “So you interviewed the people to get a plausible story line and then just did a simple estimation of the numbers going forward?”

“Yes. Pretty much”

Me, sarcastically, “And you had to show higher growth because nobody buys reports that just show that next year the same thing is going to happen that happened last year?”  Her, “um, basically.”

“For real?” “Well, yeah, we know it’s going to grow faster but nobody can be sure by how much.”  Me, “And I suppose you don’t have a degree in econometrics or statistics?”  Her, “No.”

I know it sounds like I’m making this sh*t up but I’m not.  I told this story to every consultant I knew at the time.  Nobody was surprised.  I wish it ended there.

The problem of amplification:

The problem got worse as the data flowed out to the “bulge bracket” investment banks.  They, too, were staffed with super smart twenty somethings.  But these people went to slightly better schools (Harvard, Stanford, Wharton, University of Chicago) and got slightly better grades.  They took the data from the analysts.  So did the super bright consultants at McKinsey, Bain and BCG.  We all took that data as the basis for our reports.

Then the data got amplified.  The bankers and consultants weren’t paid to do too much primary research.  So they took 3 reports, read them, put them into their own spreadsheet, made fancier graphs, had professional PowerPoint departments make killer pages and then at the bottom of the graph they typed, “Research Company Data and Consulting Company Analysis” (fill in brand names) or some derivative.  But you couldn’t just publish exactly what Gartner Group had said so these reports ended up slightly amplified in message.

Even more so with journalists.  I’m not picking on them.  They were as hoodwinked as everybody was.  They got the data feed either from the research company or from the investment bank.  And if anybody can’t publish something saying “just in, next year looks like a repeat of last year” it’s a newspaper.  So you end up with superlative amplification.  “Mobile penetration set to double next year reaching all time highs,” “venture capital market set to implode next year – more than 70% of firms may disappear” or “drug use in California growing at an alarming rate.”  We buy headlines.  Unless it’s a major publication there’s no time to fact check data in a report.  And even then …

The problem of skewing results:

Amplification is one thing.  It’s taking flawed data and making it more extreme.  But what worries me much more is skewed data.  It is very common for firms (from small ones to prestigious ones) to take data and use it conveniently to make the point that that want to make.  I have seen this so many times I consider it routine, which is why I question ALL data that I read.

How is it skewed?  There are so many ways to present data to tell the story you want that I can’t even list every way data is skewed.  Here are some examples:

– You ask a small sample set so that data isn’t statistically significant.  This is often naivete rather than malicious

– You ask a group that is not unbiased.  For example, you ask a group of prisoners what they think of the penal system, you ask college students what they think about the drinking age or you ask a group of your existing customers what they think about your product rather than people who cancelled their subscription.  This type of statistical error is known as “selective bias.”

– Also common, you look at a large data set of questions asked about consumer preferences.  You pick out the answers that support your findings and leave out the ones that don’t support it from your report.  This is an “error of omission.”

– You change the specific words asked in the survey such that you subtly change the meaning for the person reading your conclusions.  But subtle changes in words can totally change the way that the reader interprets the results.

– Also common is that the survey itself asks questions in a way that leads the responder to a specific answer.

– There are malicious data such as on Yelp where you might have a competitor that types in bad results on your survey to bring you down or maliciously positive like on the Salesforce.com AppExchange where you get your friends to rate your app 5 out of 5 so you can drive your score up.

That doesn’t happen? “I’m shocked, shocked to find that gambling is going on here.”  We all know it happens.  As my MBA statistics professor used to say, “seek disconfirming evidence.”  That always stuck with me.

Believing your own hype:

And this data subtly sinks into the psyche of your company.  It becomes folklore.  13% of GDP is construction – the largest industry.  40% of costs are labor, 40% are materials and 20% are overheads.  23% of all costs are inefficient.  18% of all errors come from people using the wrong documents. 0.8 hours are spent every day by workers searching for documents.

It’s important to quantify the value of your product or service.  I encourage it.

You’ll do your best to market the benefits ethically while still emphasizing your strong points.  Every investment banker I know is “number 1″ in something.  They just define their category tightly enough that they win it.  And then they market the F out of that result.  That’s OK.  With no numbers as proof points few people will buy your products.

Obviously try to derive data that is as accurate as possible.  And be careful that you don’t spin the numbers for so long and so hard that you can’t separate out marketing estimates from reality.  Continually seek the truth in the form of better customer surveys, more insightful market analyses and more accurate ROI calculations.  And be careful not to believe your own hype.  It can happen.  Being the number one investment bank in a greatly reduced data set shouldn’t stop you from wanting to broaden the definition of “number 1″ next year.

Here’s how to interpret data:

In the end make sure you’re suspicious of all data.  Ask yourself the obvious questions:

– who did the primary research on this analysis?

– who paid them? Nobody does this stuff free.  You’re either paid up front “sponsored research” or you’re paid on the back-end in terms of clients buying research reports.

– what motives might these people have had?

– who was in the sample set? how big was it? was it inclusive enough?

– and the important thing about data for me … I ingest it religiously.  I use it as one source of figuring out my version of the truth.  And then I triangulate.  I look for more sources if I want a truer picture.  I always try to think to myself, “what would the opposing side of this data analysis use to argue its weaknesses?”

Statistics aren’t evil.  They’re just a bit like the weather – hard to really predict.

And as they say about economists and weathermen – they’re the only two jobs you can keep while being wrong nearly 100% of the time ;-)

  • http://bothsidesofthetable.com msuster

    Yeah, I think that second sentence is the thesis of my post. People have biases and use data to support them. Period.

  • http://bothsidesofthetable.com msuster

    Thanks, Brian. Appreciate the kind feedback.

  • http://bothsidesofthetable.com msuster

    ;-)

  • http://bothsidesofthetable.com msuster

    That's a fair comment. I have a lot of respect for statisticians and that didn't come through in my post. My point was actually that many tech analysts that I've encountered don't have proper statistics backgrounds. Maybe that's changed in the last few years. Doubt it.

  • http://bothsidesofthetable.com msuster

    Agree on both points. On multiple sources – always. Triangulation helps you understand the market sizing and begins to break down your understanding of why the numbers aren't “true.”

  • http://bothsidesofthetable.com msuster

    Um …

  • http://bothsidesofthetable.com msuster

    Hey Rags. Listen, that is exactly my point. Many of the analysts I spoke with and the consultants I have encountered are not expert in field surveys, statistical analysis or data interpretation. Yet they present results as authoritative.

    I appreciate your willingness to “call me out” on my conclusions not being “statistically significant;” however, I would point out that my post (like all my posts) is my subjective point-of-view rather than quantitative fact. I have never represented my opinions as anything other than that so I'm sorry if I gave you or other readers the impression otherwise. As it happens, my story only talks about one instance but – DUDE – it has happened so many times I cringe. Ask around in the industry. I think you'll find it is pretty pervasive.

  • http://twitter.com/markmusolino Mark Musolino

    Your mention of “political” made me realize that, for the record, I should make it clear that I'm not saying that folks are purposely, or even knowingly, biasing data — rather, it's just natural to develop a preferential bias for a point of view in which you are heavily vested.

  • davidkpark

    The world would be a better place if everyone had an MA in statistics with at least a course on causal inference from a Rubin-Rosenbaum perspective.

  • http://bothsidesofthetable.com msuster

    Oh, yeah. I know. I think the same is true in tech. People often do it inadvertently. I just think that when you talk about biology, disease, food, weather patterns and similar subjects things get a bit too political so I avoided that in this post. Thanks for clarifying.

  • http://bothsidesofthetable.com msuster

    LOL

  • plus8star

    … and 61.4% of statistics are actually made up *on the spot* !

    One good reference on how to twist data right and left is the never-out-of-date “How to Lie With Statistics” by Darell Huff (ed. 1954)

    Free examples:
    http://www.stats.ox.ac.uk/~konis/talks/HtLwS.pdf

    Amazon:
    http://www.amazon.com/How-Lie-Statistics-Darrel

  • http://asable.com/ Giang Biscan

    David, thanks for sharing. Wouldn't you agree though that the output is just another data point for triangulation? The quality of it depends on the inputs, ie. market cap/valuation, discounted rate, market share – which themselves are just reported data – ie. 73.6% made up :).

    I have to agree with Mark that there is definitely value in doing the analysis though, it gives you a better judgement.

    This was a spot on post. Anyone who ever pulls reports from several reputable sources for any particular industry will run into the same problem that Mark described here. Then two things come into play: “technical” judgement (smart enough to examine the data?) & “moral” judgement (honest enough to not massage it to fit?).

  • Pingback: 73.6% of all Statistics are Made Up | CloudAve

  • http://markgslater.wordpress.com markslater

    i would not say its an excellent report at all. There have been some amazing research analysts over the course of the last 15 years – all unfortunately tarred with the blodget / meeker brush.

    If you are a banker and you are building a book or a valuation defense – then yes – alot of this 'filler' might pass muster in a pretty parade – but if you are a start-up and you are trying to ascertain as precisely as you can, the foundational drivers that you are going to build your business on – this is useless.

    If you don't already know some of the regurfitated constructs (like – geolocation is going to be big – or bollox about aug reality) in that report – then you probably should stop what you are doing.

    that report like so many others is a thinly veiled attempt at securing banking business all the way up and down the value chain by creating the illusion that they are “domain experts”.

  • http://markgslater.wordpress.com markslater

    i totally agree – we are pondering the “supporting stats approach” for investor meetings and i am seriously considering going extremely light.

    We proxy consumers with small merchants (hospitality and retail virtual assistant) we have built a texting platform (SMS) and we are in the “realtime” space. If the investor does not implicitely know the potential (and pitfalls) of building a business in this cross section – and wants to validate the investment thesis using stats – we dont want to be talking to them IMO.

  • Simon Gornick

    Mark,

    The key to your comment is 'investors implicitly knowing potentials or pitfalls' in a given sector or vertical.

    There's a tendency, in my view, highlighted by (the other) Mark's original post, to use stats as a smokescreen. But the irony of that is that it's essentially an attempt to hoodwink and massage investors. I'd rather treat them like adults.

    Investors like to say “do you homework” before you come to us. I think the same should be true of them. They need to do their research too. And part of that is knowing when stats in a given sector are real or fairy tales.

    Best

    Simon

  • Simon Gornick

    One way to really deal with the 'false certainty' issue is to create new measurements that can gain broad acceptance. Asserting the importance of a 'happiness index' as a corrollary to the absurd fluff of nominal GDP is a good example.

  • Simon Gornick

    We're living in the age of the 23 year old. They're cheap, they're tireless, they're ambitious, and they're compliant.

  • http://markgslater.wordpress.com markslater

    yes – both parties can easily be found guilty. For instance (and its very relevant for me right now) –
    we are prepping a financing and are deciding on which groups to approach – our decision criteria is as follows:
    - you have to know how (functionally) SMS is can be a command line interface in to a web app. and a keen understanding of the value of tiny data instances in realtime.
    - you have to understand the difference between a curated search instance and a live response.
    - you have to get network effects, social graphs, and other over used 2.0 phrases – so that you can in your own mind – implicitely see the value of what we are doing without it being explained or researched.

    THEN – we can talk and you can make an assessment of the team, and the other truly legitimate investment markers you have to qualify against.

    But if you need to research (ala wall street type stuff or techcrunch) a space to make a bet – you should not IMO be in the venture game.

  • Simon Gornick

    there's been a tendency to see VC as one way traffic. We approach them and kiss the ring. But actually it serves all involved far better if they're more proactive.

    good luck w your financing.

    best,
    simon @spotcher

  • http://markgslater.wordpress.com/ markslater

    i would not say its an excellent report at all. There have been some amazing research analysts over the course of the last 15 years – all unfortunately tarred with the blodget / meeker brush.

    If you are a banker and you are building a book or a valuation defense – then yes – alot of this 'filler' might pass muster in a pretty parade – but if you are a start-up and you are trying to ascertain as precisely as you can, the foundational drivers that you are going to build your business on – this is useless.

    If you don't already know some of the regurfitated constructs (like – geolocation is going to be big – or bollox about aug reality) in that report – then you probably should stop what you are doing.

    that report like so many others is a thinly veiled attempt at securing banking business all the way up and down the value chain by creating the illusion that they are “domain experts”.

  • http://markgslater.wordpress.com/ markslater

    i totally agree – we are pondering the “supporting stats approach” for investor meetings and i am seriously considering going extremely light.

    We proxy consumers with small merchants (hospitality and retail virtual assistant) we have built a texting platform (SMS) and we are in the “realtime” space. If the investor does not implicitely know the potential (and pitfalls) of building a business in this cross section – and wants to validate the investment thesis using stats – we dont want to be talking to them IMO.

  • Simon Gornick

    Mark,

    The key to your comment is 'investors implicitly knowing potentials or pitfalls' in a given sector or vertical.

    There's a tendency, in my view, highlighted by (the other) Mark's original post, to use stats as a smokescreen. But the irony of that is that it's essentially an attempt to hoodwink and massage investors. I'd rather treat them like adults.

    Investors like to say “do you homework” before you come to us. I think the same should be true of them. They need to do their research too. And part of that is knowing when stats in a given sector are real or fairy tales.

    Best

    Simon

  • Simon Gornick

    One way to really deal with the 'false certainty' issue is to create new measurements that can gain broad acceptance. Asserting the importance of a 'happiness index' as a corrollary to the absurd fluff of nominal GDP is a good example.

  • Simon Gornick

    We're living in the age of the 23 year old. They're cheap, they're tireless, they're ambitious, and they're compliant.

  • http://markgslater.wordpress.com/ markslater

    yes – both parties can easily be found guilty. For instance (and its very relevant for me right now) –
    we are prepping a financing and are deciding on which groups to approach – our decision criteria is as follows:
    - you have to know how (functionally) SMS is can be a command line interface in to a web app. and a keen understanding of the value of tiny data instances in realtime.
    - you have to understand the difference between a curated search instance and a live response.
    - you have to get network effects, social graphs, and other over used 2.0 phrases – so that you can in your own mind – implicitely see the value of what we are doing without it being explained or researched.

    THEN – we can talk and you can make an assessment of the team, and the other truly legitimate investment markers you have to qualify against.

    But if you need to research (ala wall street type stuff or techcrunch) a space to make a bet – you should not IMO be in the venture game.

  • Simon Gornick

    there's been a tendency to see VC as one way traffic. We approach them and kiss the ring. But actually it serves all involved far better if they're more proactive.

    good luck w your financing.

    best,
    simon @spotcher

  • Holger

    Great post, Mark, although I definitely heard this before in person! ;-) And it brings back memories of long days and weekends…

  • Holger

    Great post, Mark, although I definitely heard this before in person! ;-) And it brings back memories of long days and weekends…

  • http://lmframework.com/blog/about David Semeria

    Of course it's just another data point :)

    The basic rule in this kind of thing is that the more you put it in the more you get out.

    I remember one time I used the method I described above to check out a whole bunch of asset managers (solving for their projected inflows). It transpired the market was considerably less bullish than the AMs' own internal predictions (the market turned out to be right).

    It frequently amazed me how much information can be contained in just one data point (a share price) when this is used as the final value of a complex financial model.

    A few notes: it only works when there is one 'overriding' variable (eg the price of crude for oil stocks) and you must always remember that market itself is frequently wrong.

    That said, a stock price for me is the ultimate crowd-sourced metric.

  • Pingback: Finance Geek » 73.6% Of All Statistics Are Made Up

  • Pingback: BIEB Mobile Sources

  • http://lmframework.com/blog/about David Semeria

    Of course it's just another data point :)

    The basic rule in this kind of thing is that the more you put it in the more you get out.

    I remember one time I used the method I described above to check out a whole bunch of asset managers (solving for their projected inflows). It transpired the market was considerably less bullish than the AMs' own internal predictions (the market turned out to be right).

    It frequently amazed me how much information can be contained in just one data point (a share price) when this is used as the final value of a complex financial model.

    A few notes: it only works when there is one 'overriding' variable (eg the price of crude for oil stocks) and you must always remember that the market itself is frequently wrong.

    That said, a stock price for me is the ultimate crowd-sourced metric.

  • http://crisismaven.wordpress.com/ CrisisMaven

    Great post, to the point!!! I have put one of the most comprehensive link lists for hundreds of thousands of statistical sources and indicators on my blog: Statistics Reference List (http://crisismaven.wordpress.com/references/). And what I find most fascinating is how data can be visualised nowadays with the graphical computing power of modern PCs, as in manyof the dozens of examples in these Data Visualisation References (http://crisismaven.wordpress.com/references/ref…). If you miss anything that I might be able to find for you or you yourself want to share a resource, please leave a comment.
    I have put one of the most comprehensive link lists for hundreds of thousands of statistical sources and indicators on my blog: Statistics Reference List (http://crisismaven.wordpress.com/references/). And what I find most fascinating is how data can be visualised nowadays with the graphical computing power of modern PCs, as in manyof the dozens of examples in these Data Visualisation References (http://crisismaven.wordpress.com/references/ref…). If you miss anything that I might be able to find for you or you yourself want to share a resource, please leave a comment.

  • http://richineverysense.blogspot.com/ scheng1

    That's why the jobless data and other statistics are so conflicting. Those are just estimates and not very good estimates. The only way to give an accurate figure is to ask everyone whether they are working or not.

  • http://richineverysense.blogspot.com/ scheng1

    That's why the jobless data and other statistics are so conflicting. Those are just estimates and not very good estimates. The only way to give an accurate figure is to ask everyone whether they are working or not.

  • Pingback: Entrepreneurship: Nature vs. Nurture? A Religious Debate | CloudAve

  • mkolaszewski

    Its true they are made up, but they're 90% correct ;-)

  • Pingback: BIEB Volume Sources

  • mkolaszewski

    Its true they are made up, but they're 90% correct ;-)

  • http://giangbiscan.com/ Giang Biscan

    Thanks for sharing, David.

    I partially agree about stock price, but we all saw what happened late 2008, and early 2000, and many other times before… But yes you're right, as long as we don't take a number blindly. Like you said, the more you put in, the more you get out.

  • Pingback: BIEB Mobile Sources

  • Pingback: BIEB News Sources

  • http://asable.com/ Giang Biscan

    Thanks for sharing, David.

    I partially agree about stock price, but we all saw what happened late 2008, and early 2000, and many other times before… But yes you're right, as long as we don't take a number blindly. Like you said, the more you put in, the more you get out.

  • http://asable.com/ Giang Biscan

    Thanks for sharing, David.

    I partially agree about stock price, but we all saw what happened late 2008, and early 2000, and many other times before… But yes you're right, as long as we don't take a number blindly. Like you said, the more you put in, the more you get out.

  • Pingback: RebelFa1con™ | Bilimsel Cevaplar

  • http://www.hotelcomentarios.com Hoteles

    I have a pejorative for this that I developed during my one year on the VC dark side: “he's got false comfort from soft numbers.” It refers to someone who looks to cover their ass in analyst reports and financial models rather than developing a fundamental understanding of the market problem and product/service solution. Same can apply to entrepreneurs who snow job themselves (and others).

  • http://www.7one.com Wes Zummerman

    True as true can be. My Dad said that figures don't lie but Liars can figure. This was a common saying among Minnesota farmers in 1930's and it is completely true. that may be why we were taught in High School to always get a second source on any claim and then ask ourselves if it was logical. Most of the time it was not. Note that the error of omission is also known as a half truth.

    Our current society does not know truth and honesty when it smacks them in the face because it is so rare.

    Wes Zimmerman