pybop.optimisers._adamw#

Classes#

AdamWImpl

AdamW optimiser (adaptive moment estimation with weight decay), as described in [1].

Module Contents#

class pybop.optimisers._adamw.AdamWImpl(x0: numpy.ndarray, sigma0: list[float] | None, boundaries: pints.Boundaries | None)[source]#

Bases: pints.Optimiser

AdamW optimiser (adaptive moment estimation with weight decay), as described in [1].

This method is an extension of the Adam optimiser that introduces weight decay, which helps to regularise the weights and prevent overfitting.

This class reimplements the Pints’ Adam Optimiser, but with the weight decay functionality mentioned above. Original creation and credit is attributed to Pints.

Pseudo-code is given below. Here the value of the j-th parameter at iteration i is given as p_j[i] and the corresponding derivative is denoted g_j[i]:

m_j[i] = beta1 * m_j[i - 1] + (1 - beta1) * g_j[i]
v_j[i] = beta2 * v_j[i - 1] + (1 - beta2) * g_j[i]**2

m_j' = m_j[i] / (1 - beta1**(1 + i))
v_j' = v_j[i] / (1 - beta2**(1 + i))

p_j[i] = p_j[i - 1] - alpha * (m_j' / (sqrt(v_j') + eps) + lam * p_j[i - 1])

The initial values of the moments are m_j[0] = v_j[0] = 0, after which they decay with rates beta1 and beta2. The default values for these are, beta1 = 0.9 and beta2 = 0.999.

The terms m_j' and v_j' are “initialisation bias corrected” versions of m_j and v_j (see section 2 of the paper).

The parameter alpha is a step size, which is set as min(sigma0) in this implementation.

The parameter lam is the weight decay rate, which is set to 0.01 by default in this implementation.

Finally, eps is a small constant used to avoid division by zero, set to ``eps = np.finfo(float).eps in this implementation.

This is an unbounded method: Any boundaries will be ignored.

References

ask()[source]#: Returns a list of next points in the parameter-space to evaluate from the optimiser.

f_best()[source]#: Returns the best score found so far.

f_guessed()[source]#: Returns the score of the last guessed point.

n_hyper_parameters()[source]#: The number of hyper-parameters used by this optimiser.

name()[source]#: Returns the name of the optimiser.

needs_sensitivities()[source]#: Returns False if this optimiser does not require gradient, and True otherwise.

running()[source]#: Returns True if the optimisation is in progress.

tell(reply)[source]#: Receives a list of function values from the cost function from points previously specified by self.ask(), and updates the optimiser state accordingly.

x_best()[source]#: Returns the best parameter values found so far.

x_guessed()[source]#: Returns the last guessed parameter values.

_alpha#

_b1 = 0.9#

_b1t = 1#

_b2 = 0.999#

_b2t = 1#

_current#

_current_df = None#

_current_f#

_eps#

_f_best#

_lam = 0.01#

_m#

_proposed#

_ready_for_tell = False#

_running = False#

_v#

_x_best#

property b1#

property b2#

boundaries = None#

property lam#