minimize¶

xitorch.optimize.minimize(fcn: Callable[[…], torch.Tensor], y0: torch.Tensor, params: Sequence[Any] = [], bck_options: Mapping[str, Any] = {}, method: Union[str, Callable] = None, **fwd_options) → torch.Tensor[source]¶

Solve the unbounded minimization problem:

\[\mathbf{y^*} = \arg\min_\mathbf{y} f(\mathbf{y}, \theta)\]

to find the best \(\mathbf{y}\) that minimizes the output of the function \(f\).

Parameters

fcn (callable) – The function to be optimized with output tensor with 1 element.
y0 (torch.tensor) – Initial guess of the solution with shape (*ny)
params (list) – Sequence of any other parameters to be put in fcn
bck_options (dict) – Method-specific options for the backward solve (see xitorch.linalg.solve())
method (str or callable or None) – Minimization method. If None, it will choose "broyden1".
**fwd_options – Method-specific options (see method section)

Returns

The solution of the minimization with shape (*ny)

Return type

torch.tensor

Example

>>> def func1(y, A):  # example function
...     return torch.sum((A @ y)**2 + y / 2.0)
>>> A = torch.tensor([[1.1, 0.4], [0.3, 0.8]]).requires_grad_()
>>> y0 = torch.zeros((2,1))  # zeros as the initial guess
>>> ymin = minimize(func1, y0, params=(A,))
>>> print(ymin)
tensor([[-0.0519],
        [-0.2684]], grad_fn=<_RootFinderBackward>)

method="broyden1"

minimize(..., method="broyden1", *, alpha=None, uv0=None, max_rank=None, maxiter=None, f_tol=None, f_rtol=None, x_tol=None, x_rtol=None, line_search=True, verbose=False, custom_terminator=None)

Solve the root finder or linear equation using the first Broyden method 1. It can be used to solve minimization by finding the root of the function’s gradient.

References

1: B.A. van der Rotten, PhD thesis, “A limited memory Broyden method to solve high-dimensional systems of nonlinear equations”. Mathematisch Instituut, Universiteit Leiden, The Netherlands (2003). https://web.archive.org/web/20161022015821/http://www.math.leidenuniv.nl/scripties/Rotten.pdf

Keyword Arguments

alpha (float or None) – The initial guess of inverse Jacobian is - alpha * I + u v^T.
uv0 (tuple of tensors or str or None) – The initial guess of inverse Jacobian is - alpha * I + u v^T. If "svd", then it uses 1-rank svd to obtain u and v. If None, then u and v are zeros.
max_rank (int or None) – The maximum rank of inverse Jacobian approximation. If None, it is inf.
maxiter (int or None) – Maximum number of iterations, or inf if it is set to None.
f_tol (float or None) – The absolute tolerance of the norm of the output f.
f_rtol (float or None) – The relative tolerance of the norm of the output f.
x_tol (float or None) – The absolute tolerance of the norm of the input x.
x_rtol (float or None) – The relative tolerance of the norm of the input x.
line_search (bool or str) – Options to perform line search. If True, it is set to "armijo".
verbose (bool) – Options for verbosity

method="broyden2"

minimize(..., method="broyden2", *, alpha=None, uv0=None, max_rank=None, maxiter=None, f_tol=None, f_rtol=None, x_tol=None, x_rtol=None, line_search=True, verbose=False, custom_terminator=None)

Solve the root finder or linear equation using the second Broyden method 2. It can be used to solve minimization by finding the root of the function’s gradient.

References

2: B.A. van der Rotten, PhD thesis, “A limited memory Broyden method to solve high-dimensional systems of nonlinear equations”. Mathematisch Instituut, Universiteit Leiden, The Netherlands (2003). https://web.archive.org/web/20161022015821/http://www.math.leidenuniv.nl/scripties/Rotten.pdf

Keyword Arguments

alpha (float or None) – The initial guess of inverse Jacobian is - alpha * I + u v^T.
uv0 (tuple of tensors or str or None) – The initial guess of inverse Jacobian is - alpha * I + u v^T. If "svd", then it uses 1-rank svd to obtain u and v. If None, then u and v are zeros.
max_rank (int or None) – The maximum rank of inverse Jacobian approximation. If None, it is inf.
maxiter (int or None) – Maximum number of iterations, or inf if it is set to None.
f_tol (float or None) – The absolute tolerance of the norm of the output f.
f_rtol (float or None) – The relative tolerance of the norm of the output f.
x_tol (float or None) – The absolute tolerance of the norm of the input x.
x_rtol (float or None) – The relative tolerance of the norm of the input x.
line_search (bool or str) – Options to perform line search. If True, it is set to "armijo".
verbose (bool) – Options for verbosity

method="linearmixing"

minimize(..., method="linearmixing", *, alpha=None, maxiter=None, f_tol=None, f_rtol=None, x_tol=None, x_rtol=None, line_search=True, verbose=False)

Solve the root finding problem by approximating the inverse of Jacobian to be a constant scalar.

Keyword Arguments

alpha (float or None) – The initial guess of inverse Jacobian is -alpha * I.
maxiter (int or None) – Maximum number of iterations, or inf if it is set to None.
f_tol (float or None) – The absolute tolerance of the norm of the output f.
f_rtol (float or None) – The relative tolerance of the norm of the output f.
x_tol (float or None) – The absolute tolerance of the norm of the input x.
x_rtol (float or None) – The relative tolerance of the norm of the input x.
line_search (bool or str) – Options to perform line search. If True, it is set to "armijo".
verbose (bool) – Options for verbosity

method="gd"

minimize(..., method="gd", *, step=0.001, gamma=0.9, maxiter=1000, f_tol=0.0, f_rtol=1e-08, x_tol=0.0, x_rtol=1e-08, verbose=False)

Vanilla gradient descent with momentum. The stopping conditions use OR criteria. The update step is following the equations below.

\[\begin{split}\mathbf{v}_{t+1} &= \gamma \mathbf{v}_t - \eta \nabla_{\mathbf{x}} f(\mathbf{x}_t) \\ \mathbf{x}_{t+1} &= \mathbf{x}_t + \mathbf{v}_{t+1}\end{split}\]

Keyword Arguments

step (float) – The step size towards the steepest descent direction, i.e. \(\eta\) in the equations above.
gamma (float) – The momentum factor, i.e. \(\gamma\) in the equations above.
maxiter (int) – Maximum number of iterations.
f_tol (float or None) – The absolute tolerance of the output f.
f_rtol (float or None) – The relative tolerance of the output f.
x_tol (float or None) – The absolute tolerance of the norm of the input x.
x_rtol (float or None) – The relative tolerance of the norm of the input x.

method="adam"

minimize(..., method="adam", *, step=0.001, beta1=0.9, beta2=0.999, eps=1e-08, maxiter=1000, f_tol=0.0, f_rtol=1e-08, x_tol=0.0, x_rtol=1e-08, verbose=False)

Adam optimizer by Kingma & Ba (2015). The stopping conditions use OR criteria. The update step is following the equations below.

\[\begin{split}\mathbf{g}_t &= \nabla_{\mathbf{x}} f(\mathbf{x}_{t-1}) \\ \mathbf{m}_t &= \beta_1 \mathbf{m}_{t-1} + (1 - \beta_1) \mathbf{g}_t \\ \mathbf{v}_t &= \beta_2 \mathbf{v}_{t-1} + (1 - \beta_2) \mathbf{g}_t^2 \\ \hat{\mathbf{m}}_t &= \mathbf{m}_t / (1 - \beta_1^t) \\ \hat{\mathbf{v}}_t &= \mathbf{v}_t / (1 - \beta_2^t) \\ \mathbf{x}_t &= \mathbf{x}_{t-1} - \alpha \hat{\mathbf{m}}_t / (\sqrt{\hat{\mathbf{v}}_t} + \epsilon)\end{split}\]

Keyword Arguments

step (float) – The step size towards the descent direction, i.e. \(\alpha\) in the equations above.
beta1 (float) – Exponential decay rate for the first moment estimate.
beta2 (float) – Exponential decay rate for the first moment estimate.
eps (float) – Small number to prevent division by 0.
maxiter (int) – Maximum number of iterations.
f_tol (float or None) – The absolute tolerance of the output f.
f_rtol (float or None) – The relative tolerance of the output f.
x_tol (float or None) – The absolute tolerance of the norm of the input x.
x_rtol (float or None) – The relative tolerance of the norm of the input x.