Back to Insights
·8 min

Monotone Constraints in XGBoost: When Business Logic Must Override the Data

In demand estimation, there is a fundamental economic prior: higher prices should produce lower demand, all else equal. This seems obvious, but gradient boosting models don't know economics. Without explicit constraints, XGBoost will happily learn that higher prices increase demand if the data is noisy enough.

This is where monotone constraints become essential. They encode domain knowledge directly into the model structure, preventing the algorithm from learning relationships that violate economic logic.

Working on promotional optimization at AB InBev, I've seen firsthand what happens when demand models lack monotone constraints: the optimization layer produces discount recommendations that make no economic sense, because the underlying demand surface has spurious positive price-demand relationships in some regions of the feature space.

How monotone constraints work in XGBoost. During tree construction, candidate splits that would violate the specified monotone relationship are pruned. If you constrain a feature to be negatively monotone, no split on that feature will produce a child node where the prediction increases as the feature value increases.

Practical considerations:

Target transformation matters. When modeling log-transformed volume (which we use because volume is log-normally distributed), the monotone constraint applies in log-space. This preserves the economic relationship through the back-transformation.

Not all features need constraints. Apply monotone constraints only to features where the directional relationship is economically unambiguous (price, discount percentage). Don't constrain features like weather or day-of-week where the relationship is empirical.

Constraints reduce overfitting. By restricting the hypothesis space, monotone constraints act as a form of regularization. In our experience, constrained models generalize better on temporal holdouts — they trade a small amount of in-sample fit for substantially better out-of-sample stability.

Validate that constraints bind. After training, check partial dependence plots to confirm the monotone relationship holds. If the constraint never binds, the unconstrained model already learned the correct direction — but you still want the constraint as a safety net.

Monotone constraints are a clear example of a broader principle: in applied ML, domain knowledge encoded into the model structure is almost always worth more than additional data or hyperparameter tuning.