### [學習筆記] 機器學習: 維度縮減 (Machine Learning: Dimensionality Reduction)

Dimensionality reduction 是一種 unsupervised learning problem.

Dimensionality reduction 可應用於資料壓縮 (data compression).

$$x^{(1)} \in \mathbb{R}^2 \rightarrow z^{(1)} \in \mathbb{R}$$

$$x^{(2)} \in \mathbb{R}^2 \rightarrow z^{(2)} \in \mathbb{R}$$

...

$$x^{(m)} \in \mathbb{R}^2 \rightarrow z^{(m)} \in \mathbb{R}$$

Dimensionality reduction 也可應用於資料視覺化 (data visualization).
Data visualization 可以讓我們更瞭解 data, 因而有助於採用有效的 learning algorithm 來處理 data.

### Principal Component Analysis

Problem formulation:

Reduce from 2-dimension to 1-dimension: Find a vector $$u^{(1)} \in \mathbb{R}^n$$ onto which to project the data so as to minimize the projection error.

Reduce from n-dimension to k-dimension: Find $$k$$ vectors $$u^{(1)}, u^{(2)}, ..., u^{(k)}$$ onto which to project the data so as to minimize the projection error.
PCA is not linear regression.

Linear Regression: predict $$y$$ from $$x^{(i)} \in \mathbb{R}^n$$. Try to minimize the distance between $$x^{(i)}$$ and predicted $$y$$.

PCA: reduce feature numbers from $$n$$ to $$k$$. Try to minimize the projection error.

### Principal Component Analysis algorithm

Target: reduce data from $$n$$-dimensions to $$k$$-dimensions

Data preprocessing:
Training set: $$x^{(1)}, x^{(2)}, ..., x^{(m)}$$

Preprocessing (feature scaling/mean normalization):
$$\mu_j$$: mean of all the values for feature j
$$\mu_j = \frac{1}{m}\sum_{i=1}^{m}x_j^{(i)}$$
$$s_j$$: standard deviation or (max - min) of all the values for feature j

Replace each $$x_j^{(i)}$$ with $$\frac{x_j-\mu_j}{s_j}$$.
Principal Component Analysis algorithm
Compute "co-variance" matrix:
$$Sigma = \frac{1}{m}\sum_{i=1}^{n}(x^{(i)})(x^{(i)})^T$$
$$x^{(i)}$$: $$n$$x$$1$$ matrix

$$(x^{(i)})^T$$: $$1$$x$$n$$ matrix

$$Sigma$$: $$n$$x$$n$$ matrix
Compute "eigenvectors" $$u^{(1)}, u^{(2)}, ..., u^{(n)}$$ of matrix $$Sigma$$:
Octave implementation: [U, S, V] = svd(Sigma);
$$U = [u^{(1)} \: u^{(2)} \: ... \: u^{(n)}] \in \mathbb{R}^{n \times n}$$
Ureduce = first k "eigenvectors".
Octave implementation: Ureduce = U(:,1:k);
$$Ureduce = [u^{(1)} \: u^{(2)} \: ... \: u^{(k)}] \in \mathbb{R}^{n \times k}$$
Project data from $$n$$-dimensions to $$k$$-dimensions
Octave implementation: z = Ureduce'*x;
$$z^{(i)} = [u^{(1)} \: u^{(2)} \: ... \: u^{(k)}]^T x^{(i)} = \begin{bmatrix} (u^{(1)})^T \\(u^{(2)})^T \\... \\(u^{(k)})^T \end{bmatrix} x^{(i)}$$

$$[u^{(1)} \: u^{(2)} \: ... \: u^{(k)}]$$: $$n$$x$$k$$ matrix

$$[u^{(1)} \: u^{(2)} \: ... \: u^{(k)}]^T$$: $$k$$x$$n$$ matrix

$$x^{(i)}$$: $$n$$x$$1$$ matrix

$$z^{(i)}$$: $$k$$x$$1$$ matrix

Note: $$z_j = (u^{(j)})^T x$$

### Reconstruction from compressed representation

PCA (n-dimension project to k-dimension): $$z^{(i)} = U_{reduce}^T x^{(i)}$$

Reconstruction (k-dimension to n-dimension): $$x_{approx}^{(i)} = U_{reduce} z^{(i)}$$
$$U_{reduce}: n \times k$$ matrix

$$z^{(i)}: k \times 1$$ matrix

$$x_{approx}^{(i)}: n \times 1$$ matrix

### Choosing the number of principal components

Average squared project error: $$\frac{1}{m}\sum_{i=1}^{m}||x^{(i)}-x_{approx}^{(i)}||^2$$

Total variation in the data: $$\frac{1}{m}\sum_{i=1}^{m}||x^{(i)}||^2$$

Octave implementation:
[U, S, V] = svd(Sigma);

$$1 - \frac{\sum_{i=1}^{k}s_{ii}}{\sum_{i=1}^{n}s_{ii}} = \frac{\frac{1}{m}\sum_{i=1}^{m}||x^{(i)}-x_{approx}^{(i)}||^2}{\frac{1}{m}\sum_{i=1}^{m}||x^{(i)}||^2}$$

### PCA 應用上的建議

Training set: $$(x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}), ..., (x^{(m)}, y^{(m)})$$

Extract input:
Apply PCA on unlabeled dataset: $$x^{(1)}, x^{(2)}, ..., x^{(m)} \in \mathbb{R}^{10000} \rightarrow z^{(1)}, z^{(2)}, ..., z^{(m)} in \mathbb{R}^{1000}$$
New training set: $$(z^{(1)}, y^{(1)}), (z^{(2)}, y^{(2)}), ..., (z^{(m)}, y^{(m)})$$

Note: Mapping $$x^{(i)} \rightarrow z^{(i)}$$ should be defined by running PCA only on the training set (not on cross validation and test sets). This mapping can be applied as well to the examples $$x_{cv}^{(i)}$$ and $$x_{test}^{(i)}$$ in the cross validation and test sets.

# 延伸閱讀

[Coursera] Machine Learning: Dimensionality Reduction, by Andrew Ng, Stanford University