python - XGBoost reporting eval-auc has been declining,but train-auc has been rising,whether the result is normal? -


i want use early_stopping_rounds of xgboost non-overfit training. use following code:

parameters = {'nthread': 4,'objective': 'binary:logistic','learning_rate': 0.06,'max_depth': 6,'min_child_weight': 3,         'silent': 0,'gamma': 0,'subsample': 0.7,'colsample_bytree': 0.5,'n_estimators': 5,         'missing': -999,'scale_pos_weight': scale_pos_weight,'seed': 4789,'eval_metric':'auc','early_stopping_rounds': 100} x_train, x_test, y_train, y_test =train_test_split(train_feature,train_label, test_size=0.3, random_state=4789) dtrain = xgb.dmatrix(x_train, label=y_train) dtest = xgb.dmatrix(x_test, label=y_test) evallist = [(dtest, 'eval'), (dtrain, 'train')] bst = xgb.train(parameters, dtrain,num_boost_round=1500, evals=evallist) 

when print intermediate results, log like:

[1469]  eval-auc:0.912417   train-auc:0.986104 [16:04:23] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 110 nodes, 0 pruned nodes, max_depth=6 [1470]  eval-auc:0.912412   train-auc:0.986118 [16:04:27] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 102 nodes, 0 pruned nodes, max_depth=6 [1471]  eval-auc:0.912405   train-auc:0.986129 [16:04:30] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 116 nodes, 0 pruned nodes, max_depth=6 [1472]  eval-auc:0.912383   train-auc:0.986143 [16:04:34] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 116 nodes, 0 pruned nodes, max_depth=6 [1473]  eval-auc:0.912375   train-auc:0.986159 

now wondering train result right?how detect if model overfitting or not , how many rounds choose ?

as @stepan novikov said, result see right - model starting overfit.

regarding second question, way early_stopping_rounds parameter works stopping training after n rounds have passed without improvement in eval-aug (n early_stopping_rounds). note eval-auc value may decrease in between, long there absolute improvement in last n rounds, training continue.

in example, round [1469] has maximum value eval-auc, training not stop until round [1569] (100 rounds later, configured).

finally, optimum number of rounds reached should stored in bst variable of example.


Comments