随机梯度下降的早期停止¶

随机梯度下降是一种优化技术，它以随机的方式将损失函数降到最小，逐个样本进行梯度下降。特别是对线性模型进行拟合是一种非常有效的方法。

作为一种随机方法，损失函数在每一次迭代中都不一定减少，只有在期望的情况下才能保证收敛性。因此，对损失函数的收敛性进行监测是很困难的。

另一种方法是监视验证分数的收敛性。在这种情况下，输入数据被分成训练集和验证集。然后在训练集上对模型进行拟合，停止准则基于在验证集上计算的预测分数。这使我们能够找到最少的迭代次数，这足以建立一个模型，该模型可以很好的地泛化到未见数据，并减少了过度拟合训练数据的机会。

如果 early_stopping=True则早期停止策略被激活。否则，停止准则只对整个输入数据使用训练损失。为了更好地控制早期停止策略，我们可以指定一个参数validation_fraction，它设置我们保留的用于计算输入数据集的验证分数。优化将持续到验证分数在最后一次迭代中( n_iter_no_change)不再提高(通过toy)为止。实际迭代次数可在属性n_iter_中找到。

此示例演示了在sklearn.linear_model.SGDClassifier 模型中如何使用早期停止来实现与构建和不需要早期停止几乎相同的精度的模型。这可以大大缩短训练时间。请注意，使与早期迭代相比，停止标准之间的分数也有差异，因为一些训练数据是使用验证停止标准保存的。

No stopping criterion: .................................................
Training loss: .................................................
Validation score: .................................................

# Authors: Tom Dupre la Tour
#
# License: BSD 3 clause
import time
import sys

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn import linear_model
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.utils._testing import ignore_warnings
from sklearn.exceptions import ConvergenceWarning
from sklearn.utils import shuffle

print(__doc__)


def load_mnist(n_samples=None, class_0='0', class_1='8'):
    """Load MNIST, select two classes, shuffle and return only n_samples."""
    # Load data from http://openml.org/d/554
    mnist = fetch_openml('mnist_784', version=1)

    # take only two classes for binary classification
    mask = np.logical_or(mnist.target == class_0, mnist.target == class_1)

    X, y = shuffle(mnist.data[mask], mnist.target[mask], random_state=42)
    if n_samples is not None:
        X, y = X[:n_samples], y[:n_samples]
    return X, y


@ignore_warnings(category=ConvergenceWarning)
def fit_and_score(estimator, max_iter, X_train, X_test, y_train, y_test):
    """Fit the estimator on the train set and score it on both sets"""
    estimator.set_params(max_iter=max_iter)
    estimator.set_params(random_state=0)

    start = time.time()
    estimator.fit(X_train, y_train)

    fit_time = time.time() - start
    n_iter = estimator.n_iter_
    train_score = estimator.score(X_train, y_train)
    test_score = estimator.score(X_test, y_test)

    return fit_time, n_iter, train_score, test_score


# Define the estimators to compare
estimator_dict = {
    'No stopping criterion':
    linear_model.SGDClassifier(n_iter_no_change=3),
    'Training loss':
    linear_model.SGDClassifier(early_stopping=False, n_iter_no_change=3,
                               tol=0.1),
    'Validation score':
    linear_model.SGDClassifier(early_stopping=True, n_iter_no_change=3,
                               tol=0.0001, validation_fraction=0.2)
}

# Load the dataset
X, y = load_mnist(n_samples=10000)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5,
                                                    random_state=0)

results = []
for estimator_name, estimator in estimator_dict.items():
    print(estimator_name + ': ', end='')
    for max_iter in range(1, 50):
        print('.', end='')
        sys.stdout.flush()

        fit_time, n_iter, train_score, test_score = fit_and_score(
            estimator, max_iter, X_train, X_test, y_train, y_test)

        results.append((estimator_name, max_iter, fit_time, n_iter,
                        train_score, test_score))
    print('')

# Transform the results in a pandas dataframe for easy plotting
columns = [
    'Stopping criterion', 'max_iter', 'Fit time (sec)', 'n_iter_',
    'Train score', 'Test score'
]
results_df = pd.DataFrame(results, columns=columns)

# Define what to plot (x_axis, y_axis)
lines = 'Stopping criterion'
plot_list = [
    ('max_iter', 'Train score'),
    ('max_iter', 'Test score'),
    ('max_iter', 'n_iter_'),
    ('max_iter', 'Fit time (sec)'),
]

nrows = 2
ncols = int(np.ceil(len(plot_list) / 2.))
fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(6 * ncols,
                                                            4 * nrows))
axes[0, 0].get_shared_y_axes().join(axes[0, 0], axes[0, 1])

for ax, (x_axis, y_axis) in zip(axes.ravel(), plot_list):
    for criterion, group_df in results_df.groupby(lines):
        group_df.plot(x=x_axis, y=y_axis, label=criterion, ax=ax)
    ax.set_title(y_axis)
    ax.legend(title=lines)

fig.tight_layout()
plt.show()

脚本的总运行时间：(0分43.797秒)

Download Python source code: plot_sgd_early_stopping.py

Download Jupyter notebook: plot_sgd_early_stopping.ipynb