Using online data and Machine Learning to increase trial-product effectiveness

Introduction

Because of the availability of raw online data we were able to employ machine-learning techniques to predict the success of trial products of one of our clients. This allowed them to drastically increase the effectiveness of their marketing strategy.

Harvest is specialized in the collection and distribution of online data. We enable clients to understand, utilize and improve all of their online communication channels by monitoring the interaction of their customers with those channels. In other words, one of the things we do for our clients is what one usually calls “web analytics”.

Raw data

What is not so usual, when it comes to online data, is the availability of this data in its raw form, i.e. the separate “events” that tell the story of every interaction a user has with a website or app. This kind of data can be very valuable for many companies. The way we make this data available, is through our product “Harvest Store” which is built around Splunk. Here, the data is stored in such a form that it can easily be queried and analyzed in large volumes.

This setup allows for many detailed analyses and makes it possible to answer important questions, some of which might arise only months after the data-collection was implemented. All kinds of DevOps and Operational/Business Intelligence-type issues can be addressed. This can be done both in a quick and ad-hoc fashion, as well as in a more permanent way involving dashboards, monitoring, and alerts. This takes online tracking to the next level, allowing it to become a mature and reliable part of the business.

Apart from the typical analytics-type applications, the raw data also allows us to employ advanced data-science and machine-learning techniques. A telling example of this, is a project we did for one of our clients, where an issue arose in the context of trial products. This client was offering samples of its product online and used Google Adwords to push these through advertising campaigns. However, the success of these campaigns was not clear fast enough, since the total revenue would only become clear after customers would buy a regular product that followed the sample product. So, the question became: can we predict whether customers will buy a follow-up product based on their interaction with the sample product? We were able to tackle this problem, using machine learning and the large amount of raw data that was already available.

Machine Learning

Machine learning is about training an algorithm such that it can make decisions later on. This training is done using a large dataset that contains information about the thing we want to predict. In this case, we want to predict if a person will buy the product, based on their interactions with the sample product. During the training, the algorithm “optimizes” its parameters such that its predictions match the training data as well as possible. If done properly, the algorithm will then also make good predictions for data that it has not seen before.

One of the hardest parts of constructing a good Machine-Learning model is called “feature engineering”. It basically comes down to determining what factors are relevant for the prediction you want to make. What information does the algorithm need in order to be able to give the correct answer? This is often hard to say up front and determining the right combination of factors involves a lot of trial and error. This means that collecting as much information as you can in the early phases of a project is a good idea. You never know exactly what data might be relevant to the algorithm. Luckily, with our client we were already collecting a broad spectrum of user interactions and associated metadata in Harvest Store (and of course using it for all kinds of other purposes). This enabled us to start testing algorithms right away, and have the first results within a few weeks. This phase would have taken several months to a year if we would have had to wait to collect all the necessary data.

The algorithm we ended up using was a combination of a random forest and a neural network. Without going into too much detail: a random forest is an ensemble of decision trees. Decision trees are basically a set of questions that split the data in several steps. Each answer to such a question implies a direction -- a “branch” -- in the tree, until you finally end up at a specific “leaf”, which implies an answer to the main question. The main question corresponds to the desired prediction. In this case it was: will this sample product turn into a success? Examples of branching questions can be: How many visits did the user have before acquiring the sample product? Did the user customize the sample product? Did the user arrive at the website through Facebook, Google search, or via other sources? Etcetera. Which of these questions should be asked, about which features of the data, and in what order, that is determined automatically in the training phase of the algorithm, based on complicated equations from information theory. However, the result -- the decision tree -- can be represented by a flowchart and is easily understandable.

The other machine-learning model we used, a neural network, is much less easily understood. A neural network consists of several input “nodes”. As their input these nodes accept certain aspects of the data that we consider to be important. Next, it calculates many combinations of these inputs. Each of these combinations then forms a node in the next “layer” of the calculations. Subsequently, combinations of these combinations are calculated to get the next layer, etcetera. The final layer is called the output layer. In our case this final layer had one node. The value of which represents whether the sample product is going to be a success or not (so the value is either zero or one). During the training phase, the algorithm determines what combinations need to be calculated to get the correct answer. What the meaning of the intermediate nodes is (what the combinations represent and if this corresponds to something “real”), is not at all clear in most cases. From a pragmatic standpoint however, we don’t care about that. We only care that the algorithm correctly predicts the outcome as well as possible.

And this is in fact what we accomplished. Our algorithms were able to predict the success or failure of a sample product with an accuracy of 81%. This allowed our client to make decisions about their marketing campaigns in a much earlier stage. This resulted in the saving of large amounts of money on campaigns that were not successful and increased the overall effectiveness of their online marketing strategy.

Although it was not relevant in the case just described, it is easy to imagine the output of the algorithm being applied in other ways. For example, when using a call center or other means to follow up on some trial product. Knowing the chances of success for a particular customer can drastically increase the effectiveness of the follow-up strategy.

Takeaway

The main takeaway here is that having raw online data readily available is very valuable in many ways. It gives you access to important business information that would otherwise not be available, it makes your data setup very actionable and it allows for fast results. Besides, you can quickly test out new ideas, which stimulates lots of cool stuff.