Netflix's mission is to entertain the world, and their personalization algorithms play a crucial role in recommending the right content to each user. The goal is to create an experience that brings lasting enjoyment to members, not just short-term engagement.
While better input features, architectures, or more data can improve recommendation models, this post focuses on refining the objective (reward function) to better reflect long-term user satisfaction.
r(user, item)
as a function of user interaction with the recommended item.p(final feedback | observed feedbacks)
used for computing proxy rewards in bandit policy trainingAsk anything...