Summary of Improve Your Next Experiment by Learning Better Proxy Metrics From Past Experiments

  • netflixtechblog.com
  • Article
  • Summarized Content

    Introduction

    The article discusses a fundamental problem faced by technology companies and researchers: understanding the relationship between short-term proxy metrics (statistically sensitive) and long-term business outcomes or north star metrics (statistically insensitive).

    • Proxy metrics (S) like click-through rate are easier to measure but may not reflect true impact on long-term outcomes (Y) like user retention.
    • Naive approaches to understanding the S-Y relationship have pitfalls:
      • User-level correlations between S and Y do not imply a causal relationship due to confounding factors.
      • Treatment effect correlations between S and Y suffer from correlated measurement error, leading to biased estimates of the true relationship.

    Overcoming Correlated Measurement Error

    The authors propose better ways to leverage historical experiments, inspired by techniques from the literature on weak instrumental variables:

    • Total Covariance (TC) estimator: Estimates the true OLS slope by subtracting the scaled measurement error covariance from the covariance of estimated treatment effects, assuming homogeneous covariances across experiments.
    • Jackknife Instrumental Variables Estimation (JIVE): Similar to TC but does not require the homogeneous covariances assumption.
    • Limited Information Maximum Likelihood (LIML): Statistically efficient if S fully mediates all treatment effects on Y, but sensitive to this assumption.

    Practical Applications at Netflix

    These methods yield interpretable linear structural models of treatment effects, well-suited for Netflix's decentralized experimentation practice:

    • Managing metric tradeoffs: Understand the relative impact of different proxy metrics on the north star metric.
    • Informing metrics innovation: Evaluate new proxy metrics' correlation with the north star, net of existing metrics.
    • Enabling independent team work: Simple, fast models for teams to iterate on their own proxy metrics.

    Challenges and Future Work

    While the authors are excited about the research and implementation at Netflix, some challenges remain:

    • Developing a more flexible data architecture to streamline the application of these methods.
    • Continuing to strive for "great and always better" per Netflix's culture.

    Conclusion

    The article presents novel methods to overcome correlated measurement error and estimate the true relationship between proxy metrics and north star metrics, enabling better decision-making and metric development in A/B testing and experimentation practices.

    Ask anything...

    Sign Up Free to ask questions about anything you want to learn.