In the previous article, we introduced the types of uncertainty and learned several methods to model them. Now we are going to discuss how to use them in the application.
In this article, we will propose the "exploration-utilization" problem to show you how uncertain performance can help solve this problem. We will focus on the exploration of recommendation systems, but the same idea can be used in many reinforcement learning applications, such as self-driving cars, robots, etc.
Problem overview
The goal of the recommendation system is to recommend content that may be of interest to users. On our website, the user’s preferences will be viewed by the number of clicks. We will display a widget that contains content recommendations. If users want to see the content, they will click.
The probability that a user clicks on a certain content is called the click-through rate (CTR). If we know the click-through rate of all content, then how to choose content for recommendation is easy: recommend content with high CTR.
But the problem is that we do not know what CTR is. We have a model to estimate it, but obviously the result is not perfect. The willingness is the various uncertainties contained in the recommendation system, which we have summarized in the previous article.
Explore vs Exploit
Suppose you walk into an ice cream store and choose one of more than 30 flavors that you like. You may choose the best flavor you have ever tasted, or explore a new flavor that you have never tasted before, and you may find surprises.
These two strategies are "exploration" and "exploitation". We can use projects that are known to have higher CTR values, or we can explore other new projects. It is very important to add exploration into the recommendation strategy, otherwise, the new content will not be exposed.
Explore methods
The simplest "exploration-utilization" method you can use is the ϵ-greedy algorithm, in which ϵ is used as the probability of randomly selecting a new content, and the remaining probability is used to utilize it.
Although not optimal, this method is very easy to understand. It can be used as the basic standard for other complex methods, so how to find high-quality content in a better way?
Another advanced method is called Upper Confidence Bound (UCB), which takes advantage of uncertainty. Each piece of content is related to its expected CTR and a confidence bound around the CTR. The upper confidence limit can indicate how uncertain we are about the CTR of the project. The ordinary UCB algorithm records the CTR and the upper confidence limit through empirical information: we will track and record the empirical CTR of each item, and calculate the upper confidence limit by assuming a binomial distribution.
It is still the example of the ice cream shop above. If you must order chocolate-flavored ice cream every time, you give it 8 points (out of 10 points). A new flavor was launched in the store today. You don't know what it is (lack of empirical information), which means that it may be between 1 and 10 points. Using this upper confidence limit, if you want to try to explore, then you can taste it, because this flavor may be 10 points.
This is the principle of UCB-you first choose the project with the highest UCB value, in our case it is the project with the larger upper confidence limit of the CTR. Over time, the hypothetical CTR will gradually become the true CTR, and the upper confidence limit will shrink to zero. After enough time, we will be able to explore all the items.
Another popular method is Thompson Sampling. Under this method, we use the project’s CTR to estimate the distribution rather than the confidence online. For each item, we will sample a CTR from his distribution.
This method may work well for a fixed number of projects, but unfortunately, our Taboola website has thousands of project updates every day. When we get a certain possible confidence limit, the project may leave the system. Up.
So we need a way to calculate CTR estimates for new projects that have never been seen before.
Suppose there is a new chocolate flavor ice cream, because you liked ice cream very much before, so you think this one should also be good. In the ordinary UCB method, you cannot infer this conclusion using only experimental information.
In the next article, we will explain in detail how to use neural networks to estimate the CTR value of a new project, and how to take into account the level of uncertainty. Using uncertainty, we can apply the UCB method to explore new projects.
Online scales and results
So, how can we know what new projects we are exploring? At this time, it is necessary to use measurement methods to evaluate the exploration results. At Taboola, we use A/B testing.
Back to the question of ice cream, let’s suppose that you bring a friend who can help you explore new flavors. Obviously if a friend of yours picks a flavor at random, he can know exactly whether the flavor is good or not, but this It's not the smartest way. After that, another friend ordered a taste that others found delicious, so his attempt was meaningless.
On the Taboola website, we use the following method to measure the results of the exploration: For each item that has been displayed many times, and that appears in multiple different contexts, we believe that it has passed the exploration stage. Later, we will analyze which model can produce successful results. In order to calculate, the model must show the item multiple times.
Using this method, the output of the model is the number of times the item was judged.
Using this method, we can think that displaying items randomly can produce the best results, and models that do not use the UCB method also show good items, but will not produce good results. Therefore, we believe that our UCB model has a balance between exploring new projects and choosing good ones. In the long run, this trade-off is worthwhile.
Conclusion
The "exploration-utilization" problem is an important challenge for many companies' recommendation systems. We hope this article can help practitioners. In the next article, we will explain in detail the model for estimating CTR and uncertainty, so stay tuned!
Ring Type Connecting Terminals
Ring Type Connecting Terminals,Terminals,Connecting Terminals
Taixing Longyi Terminals Co.,Ltd. , https://www.longyiterminals.com