HW2

since we use 4 fold cross validation, need to divide the dataset into 4 parts, with each size of 2500. therefore, the size of training set is 2500 *3 = 7500

1-NN:
- for the first +, the closest neighbor is +, so it is classified as +
- for the second +, the closest neighbor is + , so it is classified as +
- for the third - , the closest neighbor is +, so it is classified as +
- for the fourth +, the closest neighbor is - , so it is classified as -
- therefore, the test error is 2/4 = 0.5
3-NN:
- for the first +, the closest 3 neighbors are +, +, - , so it is classified as +
- for the second +, the closest 3 neighbors are +, +, - , so it is classified as +
- for the third - , the closest 3 neighbors are +, +, + , so it is classified as +
- for the fourth +, the closest 3 neighbors are +, -, + , so it is classified as +
- therefore, the test error is 1/4 = 0.25

so there are n points, and each point is a vector of d components, and if we use the LOOCV method to evaluate, then we need to evaluate n times, and each time we need to calculate the distance between the test point and all other n-1 points, and each distance calculation needs to calculate d components, so the total time complexity is \(O(n*n*d) = O(n^2 d)\)


a. \(||x||_1 = |1| + |2| + |3| + |4| = 10\)
b. \(||x||_2 = \sqrt{1^2 + 2^2 + 3^2 + 4^2} = \sqrt{30}\)
c. \(||x||_{\infty} = \max(|1|, |2|, |3|, |4|) = 4\)

a. if the \(||x||_\infty = 1\), that means the max component of x is 1. suppose \(x \in R^d\), then the largest value of l1 form is when all components are 1, so the largest value is \(l1 = |1| + |1| + ... + |1| = d\). the largest value of l2 form is when all components are 1, so the largest value is \(l2 = \sqrt{1^2 + 1^2 + ... + 1^2} = \sqrt{d}\)
b. if the \(||x||_2 = 1\) for all points, that means \(\sum x_i^2 = 1\). suppose \(x \in R^d\), and given that the formula for \(l1 = |x_1| + |x_2| + ... + |x_d|\), \(l2 = \sqrt{x_1^2 + x_2^2 + ... + x_d^2} = 1\), and \(l_\infty = \max(|x_1|, |x_2|, ..., |x_d|)\), then the largest value of l1 form is when all components are equal. suppose there are d components, and each component is c, we have \(\sum c^2 = dc^2 = 1\), so \(c = 1/\sqrt{d}\), therefore, the largest value of l1 form is \(l1 = d * (1/\sqrt{d}) = \sqrt{d}\). the largest value of \(l_\infty\) form is when all one component is 1 and other components are 0, so the largest value is \(l_\infty = 1\)

this is not metric:
- non-negativity: all distance values are non-negative
- identity: distance is 0 iff the two points are the same
- symmetry: distance from A to B is the same as distance from B to A
- triangle inequality: one exception here: the distance from A -> B = 2, C -> A = 1, but B -> C = 4, so 4 > 2 + 1, therefore, it is not a metric

a. the distance function \(d(x,y) = x-y\) is not metric, because it does not satisfy non-negativity (for example, if x=1, y=2, then d(1,2) = 1-2 = -1)
b. the distance function d(x,y) = # positions on which x and y are different based on hamming distance is metric. this is metric distance
c. the squared euclidean distance \(d(x,y) = \sum_{i=1}^m (x_i - y_i)^2\) is not metric. when m = 1, then, d(x,y) = (x-y)^2. suppose when x=1, y=2, z=4, then d(1,2) = 1, d(2,4) = 4, but d(1,4) = 9, so 9 > 1 + 4. therefore, it does not satisfy triangle inequality



a. it should be classification problem, because output is discrete states
b. it should be regression problem, because output is continuous speed
c. it should be a regression problem, because output is continuous GPA
d. it should be classification problem, because output is discrete states

a. \(\sigma = \sqrt{\100} = 10\)
b. with the random variable X = \({0, 10, 20}\), mean = 10, variance = 100 we know that:
solving the equations, we get:

a.
\(\(Cov(X, Y) = E[XY] - E[X]E[Y]\)\)