4.9. SVM with L1 Regularization¶
An \(\ell_1\)-regularized support vector machine is a large-margin classifier that also encourages sparsity in the fitted coefficients.
The optimization model is
Here \(x_i \in \mathbb{R}^n\) is the feature vector for observation \(i\), \(y_i \in \{-1,1\}\) is its class label, \(\beta \in \mathbb{R}^n\) is the coefficient vector, and \(v\) is the intercept.
This objective again has two pieces:
The hinge loss \(\max(0, 1 - y_i(x_i^\top \beta - v))\) enforces a margin-based classification rule.
The \(\ell_1\) term \(\lambda \|\beta\|_1\) promotes a sparse classifier.
Step 1: Generate a sparse linear classification problem
We first create a hidden sparse vector beta_true and then use it to generate labels.
The line beta_true[10:] = 0 makes the last 15 entries zero, so the synthetic problem
really does have only a subset of informative features.
import numpy as np
import admm
np.random.seed(1)
m = 120
n = 25
beta_true = np.random.randn(n)
beta_true[10:] = 0
X = np.random.randn(m, n)
y = np.sign(X @ beta_true + 0.5 * np.random.randn(m))
y[y == 0] = 1
lam = 0.1
Step 2: Create the model, increase the iteration budget, and define variables
The fitted classifier uses a coefficient vector beta and an intercept v. We also
set admm_max_iteration to 10000. That larger iteration budget is part of the
reference example so this bundled dataset reaches the usual successful termination path
instead of making the default iteration cap the main outcome.
model = admm.Model()
model.setOption(admm.Options.admm_max_iteration, 10000)
beta = admm.Var("beta", n)
v = admm.Var("v")
Step 3: Write the hinge-loss term and the L1 penalty
The signed margin in this example is y * (X @ beta - v). If that signed margin is at
least 1, the hinge loss is zero. Otherwise the expression pays a linear penalty for
being inside the margin or on the wrong side of the separating hyperplane.
margin_loss = admm.sum(admm.maximum(1 - y * (X @ beta - v), 0))
model.setObjective(margin_loss / m + lam * admm.norm(beta, ord=1))
The first term averages the hinge losses over the training set. The second term is the sparsity regularizer that can remove weak features by driving some coefficients to zero.
Step 4: Add constraints
This SVM example has no explicit constraints, so there are no model.addConstr(...)
calls. The entire model is encoded through the objective.
Step 5: Solve and inspect the result
Now we solve the model and print the standard solver outputs.
model.optimize()
print(" * model.ObjVal: ", model.ObjVal) # Expected: 0.5810343957323443
print(" * model.StatusString: ", model.StatusString) # Expected: SOLVE_OPT_SUCCESS
Complete runnable example:
import numpy as np
import admm
np.random.seed(1)
m = 120
n = 25
beta_true = np.random.randn(n)
beta_true[10:] = 0
X = np.random.randn(m, n)
y = np.sign(X @ beta_true + 0.5 * np.random.randn(m))
y[y == 0] = 1
lam = 0.1
model = admm.Model()
model.setOption(admm.Options.admm_max_iteration, 10000)
beta = admm.Var("beta", n)
v = admm.Var("v")
margin_loss = admm.sum(admm.maximum(1 - y * (X @ beta - v), 0))
model.setObjective(margin_loss / m + lam * admm.norm(beta, ord=1))
model.optimize()
print(" * model.ObjVal: ", model.ObjVal) # Expected: 0.5810343957323443
print(" * model.StatusString: ", model.StatusString) # Expected: SOLVE_OPT_SUCCESS
This example is available as a standalone script in the examples/ folder of the ADMM repository:
python examples/svm_with_l1.py
The hinge loss penalizes margin violations directly, and the \(\ell_1\) term performs feature selection by zeroing weak coefficients — the same sparsity mechanism as in Lasso, applied here to a classification model.