Today Factual announced that it raised $25 million from Andreessen Horowitz & Index Ventures. I believe that this is a major new area of growth & innovation for the Internet as Cloud Services start to form deeper & richer layers. Let me explain.
For decades the “layering” of technology has allowed us to develop IT systems and networks in a specialized way that lets best-of-breed technology solutions to emerge at each layer of the stack and to allow people with different skill sets to specialize in key areas without having to have competence in every technology arena.
One obvious example is the OSI model in which we have seven layers ranging from the physical layer at the bottom of the stack (e.g. dealing with how digital or analog signals are actually transmitted for point A to point B), the network layer in the middle that deals with routing packets of information, to the presentation and application layer at the top end.
You can think of even your PC as a stack in which the hardware manufacturers handled physical layers, Microsoft handled the OS layer and application companies built higher up in the stack.
I mention this because I believe the layered metaphor for technology development has served our industry well and even if you aren’t technical it’s an important concept for you to grasp.
Over the past 5 years the Internet Cloud has started to form into layers and this is a great thing for innovation. I’m not covering the actual layers of the Internet (under the OSI model) but rather the Cloud Services layer. For every layer if I mention companies please don’t assume that I’m suggesting there aren’t other players in that category. I’m just listed who I perceive as the market leaders.
When I started my first company in 1999 we spent more than $2 million on technology infrastructure including Sun servers & Solaris operating system, Oracle databases, EMC storage, load balancers, app servers, back-up devices, disk mirrors and on and on. That is excluding a single line of code or paying any salaries. No wonder people had to raise $5 million just to get started back then. We raised $16.5 million in our A round. Hardware ate just over 10% of the round.
We put all of this infrastructure in an Exodus web hosting facility and had to pay for rack space, bandwidth and some management services if a disk failed, for example.
When I started my second company in 2005 we decided to do everything differently. By then the open-source movement had really developed. We were able to use an open source database (Postgres), open source search (Lucene) and a host of other free components including Apache Tomcat, JBoss. We still bought our own physical infrastructure: horizontally scalable application servers, load balancers, etc. So I still had to outlay $50-80k for hardware costs. So we only had to raise $500,000 to get going and again hardware ate just over 10% of the round.
And then came the debate about storage. Our chief architect, Ryan Lissack, wanted to store our data in Amazon’s new (at the time) storage product called S3 that enabled us to store all our data in their facility and we’d pay by the MBs uploaded / downloaded. I was dead set against it. I had been selling large content management systems and storing documents for industrial-scale customers. Many of the biggest customers wanted to be able to physically walk through our data center – how could I give up something so strategic? Especially to a company that sells books!
Ryan is both smart & persuasive. I trusted his judgment. He convinced me that the storage infrastructure was stable, reliable & secure. We had a data redundancy plan and the ability to bring it in-house if it wasn’t working. Work it did. To this day I’m astounded that IBM, Google, Sun, Microsoft and others didn’t offer this service and Amazon did. I guess the “stack ‘em high and sell ‘em cheap” mentality convinced this retailer that they could do the same with cloud services.
It performed incredibly well and allowed us to grow our costs incrementally as our business grew as well as to massively reduce our overall storage costs – it’s a shared infrastructure in they way that electricity or water is. If you’re not knowledgeable on the topic of this big IT migration to the cloud I’d suggest reading Nicholas Carr’s book, “The Big Switch.”
I used to recommend that companies only keep their non-core data on S3, I now recommend it whole-heartedly even for mission-critical applications. I have seed some compelling arguments for or against – but mostly on costs. From a reliability & performance perspective for most applications it will perform beautifully.
During my second startup we never considered using cloud computer processing for our real-time processes but we did run some batch processes there. At the time we viewed Amazon’s offering, EC2 as too nascent. How on Earth could I rely on Amazon to guarantee me performance so that I didn’t risk slow response times for my customers?
But sure enough over time Amazon was able to prove that they could reliably meet performance targets and so many startups bet their who infrastructure on Amazon. Think about it. Imagine that you can develop software on your local computer but the entire service is delivered virtually through a partner in the same way people consumer energy with all of the scale benefits that go with that. They deal with energy management, security, physical device failures, etc.
This has allowed people to get started for $50,000 and spend just $5,000 on hardware – again around 10%.
As companies (startups or business units of bigger companies) started betting their businesses on the stability of cloud services a host of other issues started to arise. First, how did I handle things like unexpected surges in traffic to my site (let’s say after a major press release) and then the subsequent flattening of traffic as the crowd subsided?
In the on-premise world you just had to have extra compute capacity and the ability to expand your bandwidth even if you exceeded your contractual limits of your telecom provider. But if cloud was to be more economical than this there had to be a better solution.
And in stepped new entrants at a layer above storage & processing that I would call “management services.” An early star in this category has been RightScale. They built in a feature called “auto scaling” that monitored for traffic spikes and automatically provisioned new servers on demand and decommissioned them if your traffic surge subsided.
Another key feature of RightScale was to enable you to be able to manage services across multiple clouds and abstract your management from one individual player. Unfortunately for all of us there aren’t robust competitors for the core AWS offering.
While Amazon continues to move “up the stack” and offer some of these services on their own, RightScale continues to innovate by creating better tools for deployment, monitoring and other functions.
Another big innovator in helping manage cloud implementations is Okta, founded by Todd McKinnon, the former VP of Engineering at Salesforce.com (who knows a thing or two about cloud services) and Freddy Kerrest who was senior in biz dev & sales at Salesforce and was there from 2002-07. They realized that as entreprises were increasingly using many different Cloud-based applications they didn’t have good cross-platform tools for deployment, monitoring and decommissioning. Okta solves this and more.
And of course there’s a ton of other companies one could include in this area who are taking services that today are managed mostly in-house and moving them to the cloud allowing cost reductions and standardization of non-standard management technologies. An example of this would be Mashery, who created cloud-based API services. In a world where most technology products are launched as web services having an API layer in the cloud seems an obvious trend.
So far in the stack we’ve only spoken about infrastructure. But the cloud-based stuff that we use every day as consumers (websites, Twitter, Facebook, Zynga) or as businesses (dropbox, gmail, Yammer, GoToMeeting) all rely on business logic created by application companies. If you look at any graduating class of Y Combinator they’re filled with application companies launching new, experimental services that change the way we work and live.
This is where the rubber hits the road for us as users. It’s the input screens where we enter data or search requests. It’s the screens that pop up our restaurant locations, calculate our exercise outputs or show us our bank balances. This is the top layer in the Cloud Stack.
But here’s the problem. As you can see from the depiction above there is still too much of a gap between our business logic and our underlying infrastructure. What it means is that there’s either a huge cost for us to license a proprietary database or a huge time lag for us to build one on our own.
Let’s take some examples. Let’s say you wanted to launch Yelp today. You’d need to start with a list of all of the restaurants, hotels and other businesses in the country (not to mention internationally). FourSquare faced the same issue when it launched. Remember the early days when we as trailblazing users had to enter in a bunch of restaurants ourselves?
If anybody remembers using DailyBurn (monitors calorie consumption and exercise outputs) in the early days they had a core set of data from the USDA for standard foods but then the rest of us had to help them build out their databases to say how many calories were in a PinkBerry yogurt or a grande latte at Starbucks. Each of these types of businesses have scores of related companies trying to launch and either licensing or creating the exact same data sets.
I see the same again with the entertainment industry. We all know about IMDB. But everybody is trying to get access to data on stars, movies, release dates, box office data, etc. It’s needed for Fandango, RottenTomatoes, Movies.com, Lunch.com and scores of other companies.
Same with university data. Healthcare data. Drug data. What about financial services information? Public stock market trades by senior executives of corporations. Annual accounting statements by companies. What about court records? Weather data. Criminal records, credit scores, locations of cell towers. And on and on.
Most great application businesses are built on data or create data. And historically this data has been very expensive to buy or create in the same way that servers and storage once was.
Cloud Data Platform
Enter the world of “data as a service” where businesses can consume data in the same way that they now consume Amazon’s storage services or processing. This is what Factual provides and they just raised a whopping $25 million (disclosure: my firm, GRP, is a shareholder. As this was a whopping round we didn’t lead it so my commentary in this post is 90% as an excited industry observer and only 10% as a proud investor). Factual was created in 2007 by Gil Elbaz, the founder of Applied Semantics. In case you don’t know Applied Semantics it’s Google AdSense. Google bought Gil’s company in 2003 (pre IPO) for $100+ million and this business now represents about 30% of all of Googles revenue. Wow!
What I love about Factual is that it democratizes data and make it an order of magnitude cheaper, more available and higher quality than the historical approach. I’ve had this conversation so many times over the past year that I know it’s not immediate intuitive.
Let me say it this way. Imagine the world before Wikipedia. It was heresy to suggest that crowd-sourced information could beat Encarta let alone the Encyclopedia Brittanica. Yet now it’s laughable the other way. Physically printed books or CDs with editors, reviewers and a centralized system are inherently slower and in many cases not even more accurate. And Wikipedia is deflationary meaning it takes the costs of production to almost zero.
The story of the Internet has been deflationary from Amazon to Craigslist to iTunes. And so too will be Factual. They have built algorithms that automatically crawl the web for the world’s best structured data and use heuristic techniques to ensure the quality of the data. They have built tools to store the data but also to allow 3rd-party developers to rapidly consume or even write data to their tables.
Imagine if 3 years ago Factual existed – would FourSquare, GoWalla, Booyah and every other application that relies on location data need to build or license their own? Imagine if all of their resources could have been focused on the user experience and not the underlying data that is mostly a commodity. It will take time for companies to understand that much (not all) of the world’s data is a commodity in the same way it took years for us to migrate to the cloud. But when they all do – imagine the importance of Cloud Data.
And what really excites me and what is such a win for startups in the potential to massively speed up innovation and make it cheaper. What if every YCombinator and TechStars company had access to the Factual dataset and when they created their concepts it was with a large corpus of data? What if we could publish large pools of drug data and allow hackers to create databases of drug interactions that reduce problems with prescriptions. Imagine if you could have developers building financial services apps that created more transparency of trades.
I predict that data over time will become the next major layer of the Internet supporting both consumer and business applications.
I have talked in the past about other layers that are emerging particularly in social networking and mobile applications. An obvious one is the mapping layer where SimpleGeo has a great start. Many mobile application being built today are incorporating LBS (location based services) into the user experience which often means plotting results on to a map.
And what about our social graphs? Wouldn’t it be nice if that could be managed as a Cloud Layer and then let services be created that incorporate not only our personal relationships but those of two or three degrees of separation?
I can’t dream up all the new layers that may be created in the next 10 years. But I’m pretty convinced that horizontal specialization will be a big win for many companies and for the tech ecosystem in general.