pybop.optimisers._adamw#

Classes#

AdamWImpl

AdamW optimiser (adaptive moment estimation with weight decay), as described in [1].

Module Contents#

class pybop.optimisers._adamw.AdamWImpl(x0, sigma0=0.015, boundaries=None)[source]#

Bases: pints.Optimiser

AdamW optimiser (adaptive moment estimation with weight decay), as described in [1].

This method is an extension of the Adam optimiser that introduces weight decay, which helps to regularise the weights and prevent overfitting.

This class reimplements the Pints’ Adam Optimiser, but with the weight decay functionality mentioned above. Original creation and credit is attributed to Pints.

Pseudo-code is given below. Here the value of the j-th parameter at iteration i is given as p_j[i] and the corresponding derivative is denoted g_j[i]:

m_j[i] = beta1 * m_j[i - 1] + (1 - beta1) * g_j[i]
v_j[i] = beta2 * v_j[i - 1] + (1 - beta2) * g_j[i]**2

m_j' = m_j[i] / (1 - beta1**(1 + i))
v_j' = v_j[i] / (1 - beta2**(1 + i))

p_j[i] = p_j[i - 1] - alpha * (m_j' / (sqrt(v_j') + eps) + lambda * p_j[i - 1])

The initial values of the moments are m_j[0] = v_j[0] = 0, after which they decay with rates beta1 and beta2. The default values for these are, beta1 = 0.9 and beta2 = 0.999.

The terms m_j' and v_j' are “initialisation bias corrected” versions of m_j and v_j (see section 2 of the paper).

The parameter alpha is a step size, which is set as min(sigma0) in this implementation.

The parameter lambda is the weight decay rate, which is set to 0.01 by default in this implementation.

Finally, eps is a small constant used to avoid division by zero, set to ``eps = np.finfo(float).eps in this implementation.

This is an unbounded method: Any boundaries will be ignored.

References

ask()[source]#: Returns a list of next points in the parameter-space to evaluate from the optimiser.

f_best()[source]#: Returns the best score found so far.

f_guessed()[source]#: Returns the score of the last guessed point.

n_hyper_parameters()[source]#: The number of hyper-parameters used by this optimiser.

name()[source]#: Returns the name of the optimiser.

needs_sensitivities()[source]#: Returns False if this optimiser does not require gradient, and True otherwise.

running()[source]#: Returns True if the optimisation is in progress.

set_b1(b1: float) → None[source]#: Sets the b1 momentum decay constant.

set_b2(b2: float) → None[source]#: Sets the b2 momentum decay constant.

set_lambda(lambda_: float = 0.01) → None[source]#: Sets the lambda_ decay constant. This is the weight decay rate that helps in finding the optimal solution.

tell(reply)[source]#: Receives a list of function values from the cost function from points previously specified by self.ask(), and updates the optimiser state accordingly.

x_best()[source]#: Returns the best parameter values found so far.

x_guessed()[source]#: Returns the last guessed parameter values.