Bias–variance tradeoff
The bias–variance dilemma or bias–variance problem is the conflict in trying to simultaneously minimize these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set: - The bias error is an error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting). - The variance is an error from sensitivity to small fluctuations in the training set. High variance may result from an algorithm modeling the random noise in the training data (overfitting).
The bias–variance decomposition is a way of analyzing a learning algorithm’s expected generalization error with respect to a particular problem as a sum of three terms, the bias, variance, and a quantity called the irreducible error, resulting from noise in the problem itself.
Bias–variance decomposition of mean squared error
对于数据集$D = {(x_1,y_1),…,(x_n,y_n)}$, 假设标签$y = f(x) + \epsilon$, 其中$\epsilon$为噪声, 期望为 $0$, 方差为$\sigma^2$. 希望$\hat{f}(x;D)$能尽量拟合$f(x)$.
$$E_{D,\epsilon}\left[(y-\hat{f}(x;D))^2\right] = \left(\text{Bias}_D[\hat{f}(x;D)])^2\right) + \text{Var}_D[\hat{f}(x;D)] + \text{intrinsic noise}$$
其中 $$\text{Bias}_D[\hat{f}(x;D)] = E_D[\hat{f}(x;D)] - f(x)$$
$$\text{Var}_D[\hat{f}(x;D)] = E_D\left[\left(E_D(\hat{f}(x;D)) - \hat{f}(x;D)\right)^2\right]$$
Bias–variance decomposition of Kullback-Leibler divergence
$$K(q, \hat{p}) = \int\rm{d}yq(y)\log\left[\dfrac{q(y)}{\hat{p}(y)}\right]$$
$$\text{variance} = \min_{a: \int\rm{d}ya(y)=1}EK(a,\hat{p}) = EK(\overline{p}, \hat{p}), \overline{p}(y) = \dfrac{1}{Z}\exp[E\log\hat{p}(y)]$$
$$\text{bias} = K(q,\overline{p}) = EK(q,\overline{p}) + \log Z$$
$$\text{error} = EK(q,\hat{p}) = K(q,\overline{p}) + EK(q,\hat{p})$$
$$-E\log\hat{p}(t) = -\log\overline{p}(t) + EK(\overline{p}, \hat{p})$$
$$error = -E\left[\int\rm{d}q(t)\log\hat{p}(t)\right] = -\int\rm{d}q(t)\log q(t) + K(q,\overline{p}) + EK(\overline{p}, \hat{p})$$
Reference
-
Fortmann-Roe, Scott (2012). “Understanding the Bias–Variance Tradeoff”
-
Heskes T. Bias/variance decompositions for likelihood-based estimators[J]. Neural Computation, 1998, 10(6): 1425-1433.