minimize

xitorch.optimize.minimize(fcn: Callable[[], torch.Tensor], y0: torch.Tensor, params: Sequence[Any] = [], bck_options: Mapping[str, Any] = {}, method: Union[str, Callable] = None, **fwd_options) → torch.Tensor[source]

Solve the unbounded minimization problem:

\[\mathbf{y^*} = \arg\min_\mathbf{y} f(\mathbf{y}, \theta)\]

to find the best \(\mathbf{y}\) that minimizes the output of the function \(f\).

Parameters
  • fcn (callable) – The function to be optimized with output tensor with 1 element.

  • y0 (torch.tensor) – Initial guess of the solution with shape (*ny)

  • params (list) – Sequence of any other parameters to be put in fcn

  • bck_options (dict) – Method-specific options for the backward solve (see xitorch.linalg.solve())

  • method (str or callable or None) – Minimization method. If None, it will choose "broyden1".

  • **fwd_options – Method-specific options (see method section)

Returns

The solution of the minimization with shape (*ny)

Return type

torch.tensor

Example

>>> def func1(y, A):  # example function
...     return torch.sum((A @ y)**2 + y / 2.0)
>>> A = torch.tensor([[1.1, 0.4], [0.3, 0.8]]).requires_grad_()
>>> y0 = torch.zeros((2,1))  # zeros as the initial guess
>>> ymin = minimize(func1, y0, params=(A,))
>>> print(ymin)
tensor([[-0.0519],
        [-0.2684]], grad_fn=<_RootFinderBackward>)
method="broyden1"
minimize(..., method="broyden1", *, alpha=None, uv0=None, max_rank=None, maxiter=None, f_tol=None, f_rtol=None, x_tol=None, x_rtol=None, line_search=True, verbose=False, custom_terminator=None)

Solve the root finder or linear equation using the first Broyden method 1. It can be used to solve minimization by finding the root of the function’s gradient.

References

1

B.A. van der Rotten, PhD thesis, “A limited memory Broyden method to solve high-dimensional systems of nonlinear equations”. Mathematisch Instituut, Universiteit Leiden, The Netherlands (2003). https://web.archive.org/web/20161022015821/http://www.math.leidenuniv.nl/scripties/Rotten.pdf

Keyword Arguments
  • alpha (float or None) – The initial guess of inverse Jacobian is - alpha * I + u v^T.

  • uv0 (tuple of tensors or str or None) – The initial guess of inverse Jacobian is - alpha * I + u v^T. If "svd", then it uses 1-rank svd to obtain u and v. If None, then u and v are zeros.

  • max_rank (int or None) – The maximum rank of inverse Jacobian approximation. If None, it is inf.

  • maxiter (int or None) – Maximum number of iterations, or inf if it is set to None.

  • f_tol (float or None) – The absolute tolerance of the norm of the output f.

  • f_rtol (float or None) – The relative tolerance of the norm of the output f.

  • x_tol (float or None) – The absolute tolerance of the norm of the input x.

  • x_rtol (float or None) – The relative tolerance of the norm of the input x.

  • line_search (bool or str) – Options to perform line search. If True, it is set to "armijo".

  • verbose (bool) – Options for verbosity

method="broyden2"
minimize(..., method="broyden2", *, alpha=None, uv0=None, max_rank=None, maxiter=None, f_tol=None, f_rtol=None, x_tol=None, x_rtol=None, line_search=True, verbose=False, custom_terminator=None)

Solve the root finder or linear equation using the second Broyden method 2. It can be used to solve minimization by finding the root of the function’s gradient.

References

2

B.A. van der Rotten, PhD thesis, “A limited memory Broyden method to solve high-dimensional systems of nonlinear equations”. Mathematisch Instituut, Universiteit Leiden, The Netherlands (2003). https://web.archive.org/web/20161022015821/http://www.math.leidenuniv.nl/scripties/Rotten.pdf

Keyword Arguments
  • alpha (float or None) – The initial guess of inverse Jacobian is - alpha * I + u v^T.

  • uv0 (tuple of tensors or str or None) – The initial guess of inverse Jacobian is - alpha * I + u v^T. If "svd", then it uses 1-rank svd to obtain u and v. If None, then u and v are zeros.

  • max_rank (int or None) – The maximum rank of inverse Jacobian approximation. If None, it is inf.

  • maxiter (int or None) – Maximum number of iterations, or inf if it is set to None.

  • f_tol (float or None) – The absolute tolerance of the norm of the output f.

  • f_rtol (float or None) – The relative tolerance of the norm of the output f.

  • x_tol (float or None) – The absolute tolerance of the norm of the input x.

  • x_rtol (float or None) – The relative tolerance of the norm of the input x.

  • line_search (bool or str) – Options to perform line search. If True, it is set to "armijo".

  • verbose (bool) – Options for verbosity

method="linearmixing"
minimize(..., method="linearmixing", *, alpha=None, maxiter=None, f_tol=None, f_rtol=None, x_tol=None, x_rtol=None, line_search=True, verbose=False)

Solve the root finding problem by approximating the inverse of Jacobian to be a constant scalar.

Keyword Arguments
  • alpha (float or None) – The initial guess of inverse Jacobian is -alpha * I.

  • maxiter (int or None) – Maximum number of iterations, or inf if it is set to None.

  • f_tol (float or None) – The absolute tolerance of the norm of the output f.

  • f_rtol (float or None) – The relative tolerance of the norm of the output f.

  • x_tol (float or None) – The absolute tolerance of the norm of the input x.

  • x_rtol (float or None) – The relative tolerance of the norm of the input x.

  • line_search (bool or str) – Options to perform line search. If True, it is set to "armijo".

  • verbose (bool) – Options for verbosity

method="gd"
minimize(..., method="gd", *, step=0.001, gamma=0.9, maxiter=1000, f_tol=0.0, f_rtol=1e-08, x_tol=0.0, x_rtol=1e-08, verbose=False)

Vanilla gradient descent with momentum. The stopping conditions use OR criteria. The update step is following the equations below.

\[\begin{split}\mathbf{v}_{t+1} &= \gamma \mathbf{v}_t - \eta \nabla_{\mathbf{x}} f(\mathbf{x}_t) \\ \mathbf{x}_{t+1} &= \mathbf{x}_t + \mathbf{v}_{t+1}\end{split}\]
Keyword Arguments
  • step (float) – The step size towards the steepest descent direction, i.e. \(\eta\) in the equations above.

  • gamma (float) – The momentum factor, i.e. \(\gamma\) in the equations above.

  • maxiter (int) – Maximum number of iterations.

  • f_tol (float or None) – The absolute tolerance of the output f.

  • f_rtol (float or None) – The relative tolerance of the output f.

  • x_tol (float or None) – The absolute tolerance of the norm of the input x.

  • x_rtol (float or None) – The relative tolerance of the norm of the input x.

method="adam"
minimize(..., method="adam", *, step=0.001, beta1=0.9, beta2=0.999, eps=1e-08, maxiter=1000, f_tol=0.0, f_rtol=1e-08, x_tol=0.0, x_rtol=1e-08, verbose=False)

Adam optimizer by Kingma & Ba (2015). The stopping conditions use OR criteria. The update step is following the equations below.

\[\begin{split}\mathbf{g}_t &= \nabla_{\mathbf{x}} f(\mathbf{x}_{t-1}) \\ \mathbf{m}_t &= \beta_1 \mathbf{m}_{t-1} + (1 - \beta_1) \mathbf{g}_t \\ \mathbf{v}_t &= \beta_2 \mathbf{v}_{t-1} + (1 - \beta_2) \mathbf{g}_t^2 \\ \hat{\mathbf{m}}_t &= \mathbf{m}_t / (1 - \beta_1^t) \\ \hat{\mathbf{v}}_t &= \mathbf{v}_t / (1 - \beta_2^t) \\ \mathbf{x}_t &= \mathbf{x}_{t-1} - \alpha \hat{\mathbf{m}}_t / (\sqrt{\hat{\mathbf{v}}_t} + \epsilon)\end{split}\]
Keyword Arguments
  • step (float) – The step size towards the descent direction, i.e. \(\alpha\) in the equations above.

  • beta1 (float) – Exponential decay rate for the first moment estimate.

  • beta2 (float) – Exponential decay rate for the first moment estimate.

  • eps (float) – Small number to prevent division by 0.

  • maxiter (int) – Maximum number of iterations.

  • f_tol (float or None) – The absolute tolerance of the output f.

  • f_rtol (float or None) – The relative tolerance of the output f.

  • x_tol (float or None) – The absolute tolerance of the norm of the input x.

  • x_rtol (float or None) – The relative tolerance of the norm of the input x.