I got a variety of reactions on Twitter following my GigaOM piece on how Data Scientists work at automating themselves. One of them I want to discuss today is about building businesses on top / around Prediction APIs such as Google's or BigML's (a.k.a. machine learning APIs). Anthony Nyström, who is Principal for Artificial Intelligence at Mashable, raises a very interesting point: is this even possible, and if you do build your business around these APIs, how are you going to differentiate yourself from others who could use the same APIs?
Here's one such business
The first thing that comes to my mind is Pondera Solutions, which is a company that was founded in November 2011 and that provides Fraud Detection as a Service. Their product is built on top of Google Prediction API. To me, the value of Pondera's business resides in their domain knowledge and in the connections they have that enable them to bring solutions of that sort to government agencies. Even though I know about Google Prediction API, there’s no way I would be able to replicate what they’ve done, get the data they have, and set up a business that competes with them.
Ways to differentiate yourself
Now, let's imagine that your business has a direct competitor, looking to tackle the same problem as you with the same Prediction API. I see at least two ways you can differentiate yourself:
In the way you prepare data to send to the API. Here, there are two things to consider:
- The amount of data points (i.e. the number of rows). Generally, more data beats better algorithms. If each data point is tied to an action of a customer, one could get more data by having more customers or by finding tricks to collect more data from each of them. An example of such a trick is Facebook asking me to rate movies in the side bar on the right and making the experience fun and engaging.
- The features used (i.e. the number of columns in the data). Imagine that the aim is to predict the value of a house, as Zillow's "zestimates" do. For this, you collect example data consisting of houses characteristics and the price at which they sold. Let's say that the houses represented in your data and in your competitor's are the same. If your competitor represents houses with their number of rooms, surface, type and year of construction, while you are smart enough to also extract and take into account information such as proximity to public transports, schools and amenities, then you will most likely come up with better estimates.
In the way you use predictions. One of my favorite examples to illustrate the importance of this is churn detection. It consists in detecting customers who are likely to cancel a subscription to a service. Assume that you could detect such customers with perfect accuracy. What would you do then to make them stay? Give them a special offer, or contact each one of them? Which offer would that be, or what would you tell them? How much would it cost to do that and what proportion of customers would actually end up staying? Depending on the answers to these questions, your return on investment can be very high or as low as zero, even though your churn detection system has perfect accuracy.
Value of technology
I would have liked to give more examples of businesses like Pondera, that have been built around Prediction APIs. Trust me, there are more. But for now it turns out that they are quite shy to say that publicly. As I understand it, they’re afraid that this would lower the perceived value to investors. I think that the value of those businesses is not in the machine learning technology they use but in the way they apply it to their domain. This is stuff you can patent. In most countries, it's not the technology itself that is patentable but applications of the technology to solve one specific problem in a specific domain.
One thing I want to bring to your attention is the novelty that is yet to come. Prediction services and APIs are going to enable applications of ML that we weren't able to think of. These tools are used by domain experts and they are the ones who will develop new ideas. What happens when people understand the possibilities of Machine Learning and combine that with their domain knowledge? I’m eager to find out.
Nyström said that you should "hire [Data Scientists] and [Machine Learning] experts if you want to move and shake at the dance". But there are also new dances to create. Initially, your competition is limited, but then, once your "dance" gets validation and traction, you'll be able to hire those experts if you ever want to. Prediction APIs will have done their job: removing machine learning's barrier to entry.
Enjoyed this article? Follow me on Twitter!