Table of Contents
A classic paper from Youtube in 2016, which including both DNN model structures and feature engineering tricks.
Problem Definition
- Problem definition -> increase watch time
Previous Methods & new Methods
- Previous:
- two stages model
- Candidate generation: matrix factorization trained with rank loss
- Ranking model: linear or tree based watch time prediction (regression)
- two stages model
- New: Classic two-stage structure
- candidate generation
- Method: U2I (user-item similarity)
- video vectors are trained with word2vec
- Metrics: MAP (Mean Average Precision)
- Method: U2I (user-item similarity)
- ranking:
- Method: DNN (similar structure as CG) + weighted LR ??
- Objective: function of expected watch time per impression (watch time better capture engagement)
- Metrics watch time-weighted pairwise loss
- the cross entropy loss was adjusted
- candidate generation
Impacts
- No number about the increasement
TODO & Questions & Further Reading
- ranking model -> predict watching time??
- time age as input??
- model is biased on historical data, while users like new content
- age of the training samples is included as feature, At servingtime, this feature is set to zero (or slightly negative) to reflect that the model is making predictions at the very end of the training window.
- matrix factorization trained with rank loss
- weighted LR
- feature engineering of ranking
- time since last watch -> how to normalize
- A continuous feature x with distribution f is transformed to ˜x by scaling the values such that the feature is equally distributed in [0, 1) using the cumulative distribution,
- time since last watch -> how to normalize