In machine learning, a learning curve (or training curve) plots the optimal value of a model's loss function for a training set against this loss function evaluated on a validation data set with same parameters as produced the optimal function.[1] Synonyms include error curve, experience curve, improvement curve and generalization curve.[2]
More abstractly, the learning curve is a curve of (learning effort)-(predictive performance), where usually learning effort means number of training samples and predictive performance means accuracy on testing samples.
The machine learning curve is useful for many purposes including comparing different algorithms,[3] choosing model parameters during design,[4] adjusting optimization to improve convergence, and determining the amount of data used for training.[5]
One model of a machine learning is producing a function,, which given some information,, predicts some variable,, from training data
Xtrain
Ytrain
f
x
Xtrain
We often constrain the possible functions to a parameterized family of functions,
\{f\theta(x):\theta\in\Theta\}
f
Given that it is not possible to produce a function that perfectly fits our data, it is then necessary to produce a loss function
L(f\theta(X),Y')
\theta
L(f\theta(X,Y))
\theta*(X,Y)
Then if our training data is
\{x1,x2,...,xn\},\{y1,y2,...yn\}
\{x1',x2',...xm'\},\{y1',y2',...ym'\}
i\mapsto
L(f | ||||||||||
|
(Xi),Yi)
i\mapsto
L(f | ||||||||||
|
(Xi'),Yi')
where
Xi=\{x1,x2,...xi\}
Many optimization processes are iterative, repeating the same step until the process converges to an optimal value. Gradient descent is one such algorithm. If you define
* | |
\theta | |
i |
\theta
i
i\mapsto
L(f | ||||||||||
|
(X),Y)
i\mapsto
L(f | ||||||||||
|
(X'),Y')
It is a tool to find out how much a machine model benefits from adding more training data and whether the estimator suffers more from a variance error or a bias error. If both the validation score and the training score converge to a value that is too low with increasing size of the training set, it will not benefit much from more training data.[7]
In the machine learning domain, there are two implications of learning curves differing in the x-axis of the curves, with experience of the model graphed either as the number of training examples used for learning or the number of iterations used in training the model.[8]