Skip to content

Classifier

machine learning classifier -> training set, labels.

nearest neighbor classifier

let \(x^{n}\) be the \(n\)-th training example, \(y^{n}\) be the label of \(x^{n}\).

for example, \(x^{1}\) is the image of the number 1, \(y^{1} = 1\).

suppose the size of image \(x\) is \(N\times N\) pixels, then \(x\) has a data space as a vector in \(\mathbb{R}^{N^{2}}\).

for example, if \(N=28\), then \(x\) is a vector in \(\mathbb{R}^{784}\).

in summary, data space is \(\mathbb{X} = \mathbb{R}^{N^{2}}\), label space is \(\mathbb{Y} = \{0,1,2,3,4,5,6,7,8,9\}\).

distance metric

  • a good distance metric should have:

    • non-negativity: \(d(x,y) \geq 0\)
    • identity: \(d(x,y) = 0\) iff \(x = y\)
    • symmetry: \(d(x,y) = d(y,x)\)
    • triangle inequality: \(d(x,z) \leq d(x,y) + d(y,z)\)
  • common distance metrics:

    • Euclidean distance
    • edit distance (for strings)
    • Hamming distance (for binary vectors)
    • cosine distance (for text data)
    • Jaccard distance (for sets)
  • non-metric distances:

    • relative entropy (KL divergence):

      \[D_{KL}(P||Q) = \sum_{i} P(i) \log_2 \frac{P(i)}{Q(i)}\]

how to classify using euclidean distance to find the represented number of the iamge?

  • try to find the Euclidean distance between the test image \(x\) and all training images \(x^{n}\), i.e. \(d(x,x^{n}) = ||x - x^{n}||_{2}\).
  • find the nearest neighbor \(x^{m}\) such that \(d(x,x^{m}) = \min_{n} d(x,x^{n})\).
\[||x-z||_2 = \sqrt{\sum_{i=1}^{N^{2}} (x_{i} - z_{i})^{2}}\]

fraction of test points incorrectly classified. a random number classifier has a test error of 90%. for nearest neighbor classifier, test error is ~ 3.09%. for the training points, the training error is 0%.

  • translations and rotations of images
  • natural deformations (sharp context)

instead of finding the nearest neighbor, find the k-nearest neighbors, and do a majority vote.

discrete / continuous features

- discrete y
    - binary classification
    - multi-class classification
- continuous y