4.29. Log-Cosh Robust Regression

This example shows how to package the log-cosh loss — a smooth, robust alternative to least squares — as a gradient-based UDF. The full model is

\[\begin{array}{ll} \min\limits_x & \displaystyle\sum_{i=1}^{n} \log\bigl(\cosh(a_i^\top x - b_i)\bigr) \;+\; \tfrac{\lambda}{2}\|x\|_2^2. \end{array}\]

Here \(\log(\cosh(r))\) is applied elementwise to the residual vector \(r = Ax - b\). The log-cosh function is a smooth approximation to the absolute value, so the objective trades off a robust fit to the data against an L2 regularizer. Because the loss is everywhere smooth, we can use the grad UDF path — no proximal formula needed.

The function value returned by UDFBase.eval() is the sum of log-cosh over all residuals:

\[f(r) = \sum_i \log(\cosh(r_i)).\]

The log-cosh function interpolates between L2 and L1:

\[\begin{split}\log(\cosh(r)) \;\approx\; \begin{cases} \tfrac{1}{2} r^2, & |r| \ll 1 \quad (\text{like L2 — quadratic for small residuals}), \\[4pt] |r| - \log 2, & |r| \gg 1 \quad (\text{like L1 — linear for large residuals}). \end{cases}\end{split}\]

So eval sums up the elementwise log-cosh values:

def eval(self, arglist):
    r = np.asarray(arglist[0], dtype=float)
    return float(np.sum(np.log(np.cosh(r))))

The gradient returned by UDFBase.grad() is

\[\nabla f(r)_i = \tanh(r_i),\]

which is bounded in \([-1, 1]\). This is the key to robustness: large residuals (outliers) contribute a gradient of at most \(\pm 1\), while in ordinary least squares the gradient \(2r_i\) grows without bound. The implementation is a single line:

def grad(self, arglist):
    r = np.asarray(arglist[0], dtype=float)
    return [np.tanh(r)]

The UDFBase.arguments() method tells ADMM which symbolic object this UDF depends on. In this case the custom loss is a function of one expression (the residual vector), so arguments returns a one-element list:

def arguments(self):
    return [self.arg]

Complete runnable example:

import admm
import numpy as np

class LogCoshLoss(admm.UDFBase):
    """Log-cosh loss: f(r) = sum(log(cosh(r_i))).

    A smooth approximation to L1:
        log(cosh(r)) ≈ r^2/2 for small |r|
        log(cosh(r)) ≈ |r| - log(2) for large |r|

    Gradient: tanh(r), bounded in [-1, 1].
    """
    def __init__(self, arg):
        self.arg = arg

    def arguments(self):
        return [self.arg]

    def eval(self, arglist):
        r = np.asarray(arglist[0], dtype=float)
        return float(np.sum(np.log(np.cosh(r))))

    def grad(self, arglist):
        r = np.asarray(arglist[0], dtype=float)
        return [np.tanh(r)]

# Data with 20% outliers
np.random.seed(42)
n, p = 50, 5
A = np.random.randn(n, p)
x_true = np.array([1.0, -2.0, 0.5, 0.0, 1.5])
b = A @ x_true + 0.1 * np.random.randn(n)
outlier_idx = np.random.choice(n, size=10, replace=False)
b[outlier_idx] += np.random.choice([-1, 1], size=10) * np.random.uniform(8, 15, size=10)

lam = 0.1
model = admm.Model()
x = admm.Var("x", p)
residual = A @ x - b
model.setObjective(LogCoshLoss(residual) + (lam / 2) * admm.sum(admm.square(x)))
model.optimize()

print(" * status:", model.StatusString)       # Expected: SOLVE_OPT_SUCCESS
print(" * x:", np.asarray(x.X))               # Expected: ≈ [1, -2, 0.5, 0, 1.5]
print(" * ||x - x_true||:", np.linalg.norm(np.asarray(x.X) - x_true))  # Expected: ≈ 0.26

This example is available as a standalone script in the examples/ folder of the ADMM repository:

python examples/udf_grad_log_cosh.py

In this concrete example, 20% of the measurements are corrupted by outliers with magnitude 8–15. Ordinary least squares gives \(\|x_{\text{OLS}} - x_{\text{true}}\|_2 \approx 2.07\) because the unbounded gradient \(2r_i\) lets outliers dominate the fit. Log-cosh regression recovers the true coefficients to \(\|x_{\text{log-cosh}} - x_{\text{true}}\|_2 \approx 0.26\) — an 8x improvement — because the bounded gradient \(\tanh(r_i)\) limits each outlier’s influence to at most \(\pm 1\).