2016-Deep Neural Networks for YouTube Recommendations

Table of Contents

A classic paper from Youtube in 2016, which including both DNN model structures and feature engineering tricks.

Problem Definition

Previous:
1. two stages model
  1. Candidate generation: matrix factorization trained with rank loss
  2. Ranking model: linear or tree based watch time prediction (regression)
New: Classic two-stage structure
1. candidate generation
  1. Method: U2I (user-item similarity)
    1. video vectors are trained with word2vec
  2. Metrics: MAP (Mean Average Precision)
2. ranking:
  1. Method: DNN (similar structure as CG) + weighted LR ??
  2. Objective: function of expected watch time per impression (watch time better capture engagement)
  3. Metrics watch time-weighted pairwise loss
    1. the cross entropy loss was adjusted

ranking model -> predict watching time??
time age as input??
1. model is biased on historical data, while users like new content
2. age of the training samples is included as feature, At servingtime, this feature is set to zero (or slightly negative) to reflect that the model is making predictions at the very end of the training window.
matrix factorization trained with rank loss
weighted LR
feature engineering of ranking
1. time since last watch -> how to normalize
  1. A continuous feature x with distribution f is transformed to ˜x by scaling the values such that the feature is equally distributed in [0, 1) using the cumulative distribution,