4.20. L1/2 Quasi-Norm¶
This example shows how to model the classical nonconvex \(L_{1/2}\) sparse penalty with a UDF. The full problem is
This is a classical nonconvex sparse penalty. It promotes sparsity more aggressively than \(L_1\), but the price is that the model is nonconvex, so the solver acts as a practical local method and the result should be interpreted as a locally optimal solution or stationary point.
The value returned by UDFBase.eval() is simply the penalty itself:
So eval sums \(\sqrt{|x_i|}\) over all coordinates:
def eval(self, arglist):
x = np.asarray(arglist[0], dtype=float)
return float(np.sum(np.sqrt(np.abs(x))))
The proximal operator returned by UDFBase.argmin() is separable, so it acts coordinatewise:
with
The code below implements that closed-form rule:
def argmin(self, lamb, arglist):
v = np.asarray(arglist[0], dtype=float)
abs_v = np.abs(v)
threshold = 1.5 * (lamb ** (2.0 / 3.0))
prox = np.zeros_like(v)
active = abs_v > threshold
if np.any(active):
phi = np.arccos(
np.clip((3.0 * np.sqrt(3.0) * lamb) / (4.0 * np.power(abs_v[active], 1.5)), -1.0, 1.0,)
)
prox_abs = (2.0 * abs_v[active] / 3.0) * (
1.0 + np.cos((2.0 * np.pi / 3.0) - (2.0 * phi / 3.0))
)
prox[active] = np.sign(v[active]) * prox_abs
return [prox.tolist()]
For any proper function \(f\) and scalar \(\alpha > 0\),
So the mathematically standard modeling form is to keep the UDF as the base function and write the
objective coefficient outside, for example lam * LHalfNorm(x).
The UDFBase.arguments() method again just tells ADMM which symbolic vector is the input to this UDF:
def arguments(self):
return [self.arg]
Complete runnable example:
import numpy as np
import admm
class LHalfNorm(admm.UDFBase):
def __init__(self, arg):
self.arg = arg
def arguments(self):
return [self.arg]
def eval(self, arglist):
x = np.asarray(arglist[0], dtype=float)
return float(np.sum(np.sqrt(np.abs(x))))
def argmin(self, lamb, arglist):
v = np.asarray(arglist[0], dtype=float)
abs_v = np.abs(v)
threshold = 1.5 * (lamb ** (2.0 / 3.0))
prox = np.zeros_like(v)
active = abs_v > threshold
if np.any(active):
phi = np.arccos(
np.clip((3.0 * np.sqrt(3.0) * lamb) / (4.0 * np.power(abs_v[active], 1.5)), -1.0, 1.0,)
)
prox_abs = (2.0 * abs_v[active] / 3.0) * (
1.0 + np.cos((2.0 * np.pi / 3.0) - (2.0 * phi / 3.0))
)
prox[active] = np.sign(v[active]) * prox_abs
return [prox.tolist()]
y = np.array([0.2, 1.0, 2.0])
lam = 0.5
model = admm.Model()
x = admm.Var("x", len(y))
model.setObjective(0.5 * admm.sum(admm.square(x - y)) + lam * LHalfNorm(x))
model.optimize()
print(" * x: ", np.asarray(x.X)) # Expected: ≈ [0, 0.70, 1.81]
print(" * model.ObjVal: ", model.ObjVal) # Expected: ≈ 1.17405
This example is available as a standalone script in the examples/ folder of the ADMM repository:
python examples/udf_lhalf.py
In this concrete example, \(\lambda = 0.5\), so the activation threshold is
Therefore \(|0.2| < 0.94494\), so the first coordinate is driven to zero, while \(|1.0|\) and \(|2.0|\) lie above the threshold and are shrunk by the closed-form proximal map. The returned point is approximately
The smallest coordinate is eliminated; the larger coordinates are shrunk (not kept unchanged) by the \(L_{1/2}\) proximal map.