minimize¶
-
xitorch.optimize.
minimize
(fcn: Callable[[…], torch.Tensor], y0: torch.Tensor, params: Sequence[Any] = [], bck_options: Mapping[str, Any] = {}, method: Union[str, Callable] = None, **fwd_options) → torch.Tensor[source]¶ Solve the unbounded minimization problem:
\[\mathbf{y^*} = \arg\min_\mathbf{y} f(\mathbf{y}, \theta)\]to find the best \(\mathbf{y}\) that minimizes the output of the function \(f\).
- Parameters
fcn (callable) – The function to be optimized with output tensor with 1 element.
y0 (torch.tensor) – Initial guess of the solution with shape
(*ny)
params (list) – Sequence of any other parameters to be put in
fcn
bck_options (dict) – Method-specific options for the backward solve (see
xitorch.linalg.solve()
)method (str or callable or None) – Minimization method. If None, it will choose
"broyden1"
.**fwd_options – Method-specific options (see method section)
- Returns
The solution of the minimization with shape
(*ny)
- Return type
torch.tensor
Example
>>> def func1(y, A): # example function ... return torch.sum((A @ y)**2 + y / 2.0) >>> A = torch.tensor([[1.1, 0.4], [0.3, 0.8]]).requires_grad_() >>> y0 = torch.zeros((2,1)) # zeros as the initial guess >>> ymin = minimize(func1, y0, params=(A,)) >>> print(ymin) tensor([[-0.0519], [-0.2684]], grad_fn=<_RootFinderBackward>)
-
method="broyden1"
minimize(..., method="broyden1", *, alpha=None, uv0=None, max_rank=None, maxiter=None, f_tol=None, f_rtol=None, x_tol=None, x_rtol=None, line_search=True, verbose=False, custom_terminator=None)
Solve the root finder or linear equation using the first Broyden method 1. It can be used to solve minimization by finding the root of the function’s gradient.
References
- 1
B.A. van der Rotten, PhD thesis, “A limited memory Broyden method to solve high-dimensional systems of nonlinear equations”. Mathematisch Instituut, Universiteit Leiden, The Netherlands (2003). https://web.archive.org/web/20161022015821/http://www.math.leidenuniv.nl/scripties/Rotten.pdf
- Keyword Arguments
alpha (float or None) – The initial guess of inverse Jacobian is
- alpha * I + u v^T
.uv0 (tuple of tensors or str or None) – The initial guess of inverse Jacobian is
- alpha * I + u v^T
. If"svd"
, then it uses 1-rank svd to obtainu
andv
. If None, thenu
andv
are zeros.max_rank (int or None) – The maximum rank of inverse Jacobian approximation. If
None
, it isinf
.maxiter (int or None) – Maximum number of iterations, or inf if it is set to None.
f_tol (float or None) – The absolute tolerance of the norm of the output
f
.f_rtol (float or None) – The relative tolerance of the norm of the output
f
.x_tol (float or None) – The absolute tolerance of the norm of the input
x
.x_rtol (float or None) – The relative tolerance of the norm of the input
x
.line_search (bool or str) – Options to perform line search. If
True
, it is set to"armijo"
.verbose (bool) – Options for verbosity
-
method="broyden2"
minimize(..., method="broyden2", *, alpha=None, uv0=None, max_rank=None, maxiter=None, f_tol=None, f_rtol=None, x_tol=None, x_rtol=None, line_search=True, verbose=False, custom_terminator=None)
Solve the root finder or linear equation using the second Broyden method 2. It can be used to solve minimization by finding the root of the function’s gradient.
References
- 2
B.A. van der Rotten, PhD thesis, “A limited memory Broyden method to solve high-dimensional systems of nonlinear equations”. Mathematisch Instituut, Universiteit Leiden, The Netherlands (2003). https://web.archive.org/web/20161022015821/http://www.math.leidenuniv.nl/scripties/Rotten.pdf
- Keyword Arguments
alpha (float or None) – The initial guess of inverse Jacobian is
- alpha * I + u v^T
.uv0 (tuple of tensors or str or None) – The initial guess of inverse Jacobian is
- alpha * I + u v^T
. If"svd"
, then it uses 1-rank svd to obtainu
andv
. If None, thenu
andv
are zeros.max_rank (int or None) – The maximum rank of inverse Jacobian approximation. If
None
, it isinf
.maxiter (int or None) – Maximum number of iterations, or inf if it is set to None.
f_tol (float or None) – The absolute tolerance of the norm of the output
f
.f_rtol (float or None) – The relative tolerance of the norm of the output
f
.x_tol (float or None) – The absolute tolerance of the norm of the input
x
.x_rtol (float or None) – The relative tolerance of the norm of the input
x
.line_search (bool or str) – Options to perform line search. If
True
, it is set to"armijo"
.verbose (bool) – Options for verbosity
-
method="linearmixing"
minimize(..., method="linearmixing", *, alpha=None, maxiter=None, f_tol=None, f_rtol=None, x_tol=None, x_rtol=None, line_search=True, verbose=False)
Solve the root finding problem by approximating the inverse of Jacobian to be a constant scalar.
- Keyword Arguments
alpha (float or None) – The initial guess of inverse Jacobian is
-alpha * I
.maxiter (int or None) – Maximum number of iterations, or inf if it is set to None.
f_tol (float or None) – The absolute tolerance of the norm of the output
f
.f_rtol (float or None) – The relative tolerance of the norm of the output
f
.x_tol (float or None) – The absolute tolerance of the norm of the input
x
.x_rtol (float or None) – The relative tolerance of the norm of the input
x
.line_search (bool or str) – Options to perform line search. If
True
, it is set to"armijo"
.verbose (bool) – Options for verbosity
-
method="gd"
minimize(..., method="gd", *, step=0.001, gamma=0.9, maxiter=1000, f_tol=0.0, f_rtol=1e-08, x_tol=0.0, x_rtol=1e-08, verbose=False)
Vanilla gradient descent with momentum. The stopping conditions use OR criteria. The update step is following the equations below.
\[\begin{split}\mathbf{v}_{t+1} &= \gamma \mathbf{v}_t - \eta \nabla_{\mathbf{x}} f(\mathbf{x}_t) \\ \mathbf{x}_{t+1} &= \mathbf{x}_t + \mathbf{v}_{t+1}\end{split}\]- Keyword Arguments
step (float) – The step size towards the steepest descent direction, i.e. \(\eta\) in the equations above.
gamma (float) – The momentum factor, i.e. \(\gamma\) in the equations above.
maxiter (int) – Maximum number of iterations.
f_tol (float or None) – The absolute tolerance of the output
f
.f_rtol (float or None) – The relative tolerance of the output
f
.x_tol (float or None) – The absolute tolerance of the norm of the input
x
.x_rtol (float or None) – The relative tolerance of the norm of the input
x
.
-
method="adam"
minimize(..., method="adam", *, step=0.001, beta1=0.9, beta2=0.999, eps=1e-08, maxiter=1000, f_tol=0.0, f_rtol=1e-08, x_tol=0.0, x_rtol=1e-08, verbose=False)
Adam optimizer by Kingma & Ba (2015). The stopping conditions use OR criteria. The update step is following the equations below.
\[\begin{split}\mathbf{g}_t &= \nabla_{\mathbf{x}} f(\mathbf{x}_{t-1}) \\ \mathbf{m}_t &= \beta_1 \mathbf{m}_{t-1} + (1 - \beta_1) \mathbf{g}_t \\ \mathbf{v}_t &= \beta_2 \mathbf{v}_{t-1} + (1 - \beta_2) \mathbf{g}_t^2 \\ \hat{\mathbf{m}}_t &= \mathbf{m}_t / (1 - \beta_1^t) \\ \hat{\mathbf{v}}_t &= \mathbf{v}_t / (1 - \beta_2^t) \\ \mathbf{x}_t &= \mathbf{x}_{t-1} - \alpha \hat{\mathbf{m}}_t / (\sqrt{\hat{\mathbf{v}}_t} + \epsilon)\end{split}\]- Keyword Arguments
step (float) – The step size towards the descent direction, i.e. \(\alpha\) in the equations above.
beta1 (float) – Exponential decay rate for the first moment estimate.
beta2 (float) – Exponential decay rate for the first moment estimate.
eps (float) – Small number to prevent division by 0.
maxiter (int) – Maximum number of iterations.
f_tol (float or None) – The absolute tolerance of the output
f
.f_rtol (float or None) – The relative tolerance of the output
f
.x_tol (float or None) – The absolute tolerance of the norm of the input
x
.x_rtol (float or None) – The relative tolerance of the norm of the input
x
.