Classifier

machine learning classifier -> training set, labels.

nearest neighbor classifier

let \(x^{n}\) be the \(n\)-th training example, \(y^{n}\) be the label of \(x^{n}\).

for example, \(x^{1}\) is the image of the number 1, \(y^{1} = 1\).

suppose the size of image \(x\) is \(N\times N\) pixels, then \(x\) has a data space as a vector in \(\mathbb{R}^{N^{2}}\).

for example, if \(N=28\), then \(x\) is a vector in \(\mathbb{R}^{784}\).

in summary, data space is \(\mathbb{X} = \mathbb{R}^{N^{2}}\), label space is \(\mathbb{Y} = \{0,1,2,3,4,5,6,7,8,9\}\).

distance metric

a good distance metric should have:
- non-negativity: \(d(x,y) \geq 0\)
- identity: \(d(x,y) = 0\) iff \(x = y\)
- symmetry: \(d(x,y) = d(y,x)\)
- triangle inequality: \(d(x,z) \leq d(x,y) + d(y,z)\)
common distance metrics:
- Euclidean distance
- edit distance (for strings)
- Hamming distance (for binary vectors)
- cosine distance (for text data)
- Jaccard distance (for sets)
non-metric distances:
- relative entropy (KL divergence):
  
  \[D_{KL}(P||Q) = \sum_{i} P(i) \log_2 \frac{P(i)}{Q(i)}\]

how to classify using euclidean distance to find the represented number of the iamge?

try to find the Euclidean distance between the test image \(x\) and all training images \(x^{n}\), i.e. \(d(x,x^{n}) = ||x - x^{n}||_{2}\).
find the nearest neighbor \(x^{m}\) such that \(d(x,x^{m}) = \min_{n} d(x,x^{n})\).

\[||x-z||_2 = \sqrt{\sum_{i=1}^{N^{2}} (x_{i} - z_{i})^{2}}\]

Test errorfactors of test error in nearest neighbor classifierk-nearest neighbor classifier

fraction of test points incorrectly classified. a random number classifier has a test error of 90%. for nearest neighbor classifier, test error is ~ 3.09%. for the training points, the training error is 0%.

translations and rotations of images
natural deformations (sharp context)

instead of finding the nearest neighbor, find the k-nearest neighbors, and do a majority vote.

discrete / continuous features

- discrete y
    - binary classification
    - multi-class classification
- continuous y