[學習筆記] 機器學習: 推薦系統 (Machine Learning: Recommender Systems)

當我們線上購物時,許多網站會自動推薦我們可能會感興趣的產品,這是推薦系統的應用,它會根據使用者和產品之間活動的型態來產生這些推薦。



以預測 user 對於電影的評等為例, 如上表, 已知 user 的評等為標示的 0-5, 未評等的部分為 ?, 我們想要預測未評等的部分, user 可能會給予評等為何.
\( n_u \): number of users

\( n_m \): number of movies

\( r(i, j)\): 1 if user \(j\) has rated movie \(i\)

\( y^{(i, j)}\): rating given by user \(j\) to movie \(i\) (defined only if \(r(i, j) = 1\))

目標: 預測 undefined \( y^{(i, j)}\)

Content based recommendations


Problem formulation
\( r(i, j)\): 1 if user \(j\) has rated movie \(i\) (0 otherwise)

\( y^{(i, j)}\): rating given by user \(j\) to movie \(i\) (defined only if \(r(i, j) = 1\))

\( \theta^{(j)} \): parameter vector for user \(j\)

\( x^{(i)} \): feature vector for movie \(i\)

For user \(j\), movie \(i\), predicted rating: \( (\theta^{(j)})^T(x^{(i)}) \)

\(m^{(j)}\): number of movies rated by user \(j\)

假設我們知道每部電影的 features \( x^{(i)} \), 我們想要學習得到每位 user 的 \( \theta^{(j)} \).
Optimization algorithm:
To learn \(\theta^{(j)}\) (parameter for user \(j\)):
minimize \(\frac{1}{2}\sum_{i:r(i,j)=1}((\theta^{(j)})^T x^{(i)}-y^{(i,j)})^2 + \frac{\lambda}{2}\sum_{k=1}^{n}(\theta_k^{(j)})^2 \)
To learn \(\theta^{(1)}, \theta^{(2)}, ..., \theta^{(n_u)}\):
minimize \(\frac{1}{2}\sum_{j=1}^{n_u}\sum_{i:r(i,j)=1}((\theta^{(j)})^T x^{(i)}-y^{(i,j)})^2 + \frac{\lambda}{2}\sum_{j=1}^{n_u}\sum_{k=1}^{n}(\theta_k^{(j)})^2 \)
Gradient descent update:
\(\theta_k^{(j)} := \theta_k^{(j)} - \alpha \sum_{i:r(i,j)=1} ((\theta^{(j)})^T x^{(i)} - y ^{(i,j)}) x_k^{(i)} \quad (\text{for } k = 0) \)

\(\theta_k^{(j)} := \theta_k^{(j)} - \alpha (\sum_{i:r(i,j)=1} ((\theta^{(j)})^T x^{(i)} - y ^{(i,j)}) x_k^{(i)} + \lambda \theta_k^{(j)}) \quad (\text{for } k \neq 0) \)

Collaborative filtering


Content based recommendations 是假設我們知道每部電影的 features \( x^{(i)} \), 我們想要學習得到每位 user 的 \( \theta^{(j)} \).

但我們不見得能夠知道每部電影的 features \( x^{(i)} \), 因此另一種 approach 是, 假設我們知道每位 user 的 \( \theta^{(j)} \), 據此學習得到每部電影的 features \( x^{(i)} \).

Optimization algorithm:
Given \(\theta^{(1)}, \theta^{(2)}, ..., \theta^{(n_u)} \), to learn \( x^{(i)} \):
minimize \(\frac{1}{2} \sum_{j:r(i,j)=1} ((\theta^{(j)})^T x^{(i)} - y ^ {(i,j)})^2 + \frac{\lambda}{2}\sum_{k=1}^{n} (x_k^{(i)})^2 \)
Given \(\theta^{(1)}, \theta^{(2)}, ..., \theta^{(n_u)} \), to learn \( x^{(1)}, x^{(2)}, ..., x^{(n_m)} \):
minimize \(\frac{1}{2} \sum_{i=1}^{n_m} \sum_{j:r(i,j)=1} ((\theta^{(j)})^T x^{(i)} - y ^ {(i,j)})^2 + \frac{\lambda}{2} \sum_{i=1}^{n_m} \sum_{k=1}^{n} (x_k^{(i)})^2 \)

Collaborative filtering:
結合這 2 種作法:

Given \(x^{(1)}, x^{(2)}, ..., x^{(n_m)}\), can estimate \(\theta^{(1)}, \theta^{(2)}, ..., \theta^{(n_u)}\)

Given \(\theta^{(1)}, \theta^{(2)}, ..., \theta^{(n_u)}\), can estimate \(x^{(1)}, x^{(2)}, ..., x^{(n_m)}\)

首先, 猜測一組 \(\theta\), 據此推測一組 \(x\), 再據此推測 \(\theta\), ... 逐漸達到收斂.

Collaborative filtering algorithm




Low rank factorization

Learn 到的 features 不一定是人類看得懂的。