model-evaluation

Splitting datasets

From "Pattern Recognition and Neural Networks":

– Training set: A set of examples used for learning, that is to fit the parameters of the classifier.

– Validation set: A set of examples used to tune the parameters of a classifier, for example to choose the number of hidden units in a neural network.

– Test set: A set of examples used only to assess the performance of a fully-specified classifier.

So generally speaking, if no hyperparameter tuning is performed, the dataset used to evaluate generalization is the test set. There is a semantic—albeit nuanced—difference between the two that depends on what exactly you're doing.