Optimization

Hello, and welcome to our tutorial on optimization. Here, we will explore the four different optimizers which already have Tequila interfaces: a native GD optimizer, alongside interfaces for SciPy, GPyOpt, and Phoenics.

### start at the start: import statements!
import tequila as tq
import numpy as np

Overview

How to optimize an ``Objective``:

In tequila, optimizers are accessed in several ways. They may be instantiated as objects directly, which can then be called; they are also accessible through tq.minimize. Either of these methods have an obligatory argument: an Objective. tq.minimize also requires you supply a method, which must be a string; the call methods of the GD, SciPy, and GPyOpt optimizers accept this key as well.

As keywords to any optimization call, you can pass down the usual compilation kwargs for quantum simulation in Tequila: * backend, a string indicating which quantum simulator to use * samples, an int indicating how many shots of the circuit to measure (None means full wf simulation) * device, (generally) a string indicating which (real or emulated) quantum computer to sample from (requires samples be specified), * noise, the NoiseModel object to apply to the circuits simulated (see the tutorial on noise).

additional keywords you might use include: * variables, a list of the Variables you want to optimize (the default is all of them). * initial_values, which gives a start point to optimization. If you supply arguments to variables, you also need to supply arguments to initial_values so that all non-optimized parameters have a value to use. The default is random initialization (not recomended) * gradient, which specifies which type of gradient will be used : analytical gradients (Default): gradient=None, numerical gradients: gradient={‘method’:’2-point’, “stepsize”:1.e-4}, custom gradient objectives: gradient={tq.Variable:tq.Objective}, module specific gradients: gradient=”2-point” (to use for example SciPy finite difference stencils). * silent, silence outputs

Some of the optimizers take more, or different, keywords from the others, so check the documentation for each one. In case the optimizer has some degree of verbosity (currently, they all do), you can deactivate this with silent=True.

The following optimization methods are available on your system in Tequila:

tq.optimizers.show_available_optimizers()

We will use two different Objectives for optimization in this tutorial. The first of these is a two qubit expectation value with the tractable but non trivial hamiltonian \([Y(0)+Qm(0)]\otimes X(1)\), where \(Qm=\frac{1}{2} (I - Z)\), the projector onto the |1> state.

### optimizing the circuit in terms of pi makes the result of the optimization easier to interpret.

a = tq.Variable(name="a")*tq.numpy.pi
b = tq.Variable(name="b")*tq.numpy.pi
c = tq.Variable(name="c")*tq.numpy.pi
d = tq.Variable(name='d')*tq.numpy.pi

U1 = tq.gates.H(target=[0])
U1 += tq.gates.H(target=1)
U1 += tq.gates.Ry(target=0, angle=a)
U1 += tq.gates.Rz(target=1, angle=b)
U1 += tq.gates.Z(target=1,control=0)
U1 += tq.gates.Rx(target=0, angle=c)
U1 += tq.gates.Rx(target=1,angle=d)
U1 += tq.gates.Z(target=1,control=0)


### once we have a circuit, we pick a hamiltonian to optimize over
H1=(tq.paulis.Y(0)+tq.paulis.Qm(0))*tq.paulis.X(1)
O1=tq.ExpectationValue(U=U1,H=H1)

### we use the .draw function to pretty-print circuits via backend printers.
print('We will optimize the following objective: \n')
tq.draw(O1,backend='qiskit')

Our second Objective, O2, will measure a 3-qubit circuit with respect to the Hamiltonian \(Y(0)\otimes X(1) \otimes Y(2)\)

### this time, don't scale by pi

H2 = tq.paulis.Y(0)*tq.paulis.X(1)*tq.paulis.Y(2)
U2 = tq.gates.Ry(tq.numpy.pi/2,0) +tq.gates.Ry(tq.numpy.pi/3,1)+tq.gates.Ry(tq.numpy.pi/4,2)
U2 += tq.gates.Rz('a',0)+tq.gates.Rz('b',1)
U2 += tq.gates.CNOT(control=0,target=1)+tq.gates.CNOT(control=1,target=2)
U2 += tq.gates.Ry('c',1) +tq.gates.Rx('d',2)
U2 += tq.gates.CNOT(control=0,target=1)+tq.gates.CNOT(control=1,target=2)
O2 = tq.ExpectationValue(H=H2, U=U2)

print('We will optimize the following objective: \n')
tq.draw(O2, backend="qiskit")

Local Optimizers

We will begin this tutorial by focusing on local optimizers. By local optimization, we mean the any optimization schema where the suggested parameters at step t are always a transformation of the parameters suggested at step t-1. This includes a large number of the standard optimization techniques in use today, like gradient descent. Tequila comes with two local optimizers: a native gradient descent optimizer, implementing a number of the most popular gradient descent algorithms used in classical machine learning, as well as a plugin for the SciPy package (which is installed alongside Tequila), which allows the use of a number of gradient-free, gradient-based, and hessian-based optimization methods.

The GD Optimizer

we will start this tutorial by looking at the GD optimizer. Here is an overview over the available optimization methods.

tq.show_available_optimizers(module="gd")

As one sees, a variety of methods are available for optimization. Here, ‘sgd’ refers to the standard gradient descent algorithm, without momentum. like all tequila optimizers, the GD optimizer has a minimize function and most of the arguments are the same. However, there is one important difference: the GD optimizer takes a learning rate, lr. This parameter mediates step size in all of the GD optimizer methods; it is a positive float which scales the step in the direction of the gradient.

We will now optimize O1, our two-qubit expectation value, choosing starting angles equivalent to \(\frac{1}{4}\pi\) for all four variables, and optimizing via the ‘Adam’ method.

init={'a':0.25,'b':0.25,'c':0.25,'d':0.25}
lr=0.1

### For even more fun, try using sampling with the samples keyword,
### or pick your favorite backend with the 'backend' keyword!

adam_result=tq.minimize(objective=O1,lr=lr,
              method='adam',
              maxiter=80,
              initial_values=init,
              silent=True)

The plots below show the trajectory of both the value of the objective and the values of the angles as a function of time.

adam_result.history.plot('energies')
adam_result.history.plot('angles')
print('best energy: ',adam_result.energy)
print('optimal angles: ',adam_result.angles)

We see that, minus a few hiccups, all the angles converge to optimimum values.

Let’s repeat what we did above, but with a few of the other methods! Here’s RMSprop:

init={'a':0.25,'b':0.25,'c':0.25,'d':0.25}
lr=0.01
rms_result=tq.minimize(objective=O1,lr=lr,
              method='rmsprop',
              maxiter=80,
              initial_values=init,
              silent=True)
print('RMSprop optimization results:')
rms_result.history.plot('energies')
rms_result.history.plot('angles')
print('best energy: ',rms_result.energy)
print('optimal angles: ',rms_result.angles)

… And here’s Momentum:

init={'a':0.25,'b':0.25,'c':0.25,'d':0.25}
lr=0.1
mom_result=tq.minimize(objective=O1,lr=lr,
              method='momentum',
              maxiter=80,
              initial_values=init,
              silent=True)

print('momentum optimization results:')
mom_result.history.plot('energies')
mom_result.history.plot('angles')
print('best energy: ',mom_result.energy)
print('optimal angles: ',mom_result.angles)

Note that when using the RMSprop method, we reduced the learning rate from 0.1 to 0.01. Different methods may be more or less sensitive to choices of initial learning rate. Try going back to the previous examples, and choosing different learning rates, or different initial parameters, to gain a feel for how sensitive different methods are.

The GD optimizer, with the Quantum Natural Gradient:

The Quantum Natural Gradient, or QNG, is a novel method of calculating gradients for quantum systems, inspired by the natural gradient sometimes employed in classical machine learning. The usual gradient we employ is with respect to a euclidean manifold, but this is not the only geometry – nor even, the optimal geometry – of quantum space. The QNG is, in essence, a method of taking gradients with respect to (an approximation to) the Fubini-Study metric. For information on how (and why) the QNG is used, see Stokes et.al.

Using the qng in Tequila is as simple as passing in the keyword gradient=’qng’ to optimizers which support it, such as the GD optimizer. We will use it to optimize O2, our 3 qubit Objective, and then compare the results to optimizing the same circuit with the regular gradient.

### the keyword stop_count, below, stops optimization if no improvement occurs after 50 epochs.
### let's use a random initial starting point:
init={k:np.random.uniform(-2,2) for k in ['a','b','c','d']}

lr=0.01
qng_result = tq.minimize(objective=O2,
                     gradient='qng',
                     method='sgd', maxiter=200,lr=lr,
                     initial_values=init, silent=True)
qng_result.history.plot('energies')
qng_result.history.plot('angles')
print('best energy with qng: ',qng_result.energy)
print('optimal angles with qng: ',qng_result.angles)

To gain appreciation for why one might use the QNG, let’s optimize the same circuit with the same learning rate and the same method, but without QNG.

lr=0.01
sgd_noqng_result = tq.minimize(objective=O2,
                     gradient=None,
                     method='sgd', maxiter=200,lr=lr,
                     initial_values=init, silent=True)
print('plotting what happens without QNG')
sgd_noqng_result.history.plot('energies')
sgd_noqng_result.history.plot('angles')
print('best energy without qng: ',sgd_noqng_result.energy)
print('optimal angles without qng: ',sgd_noqng_result.angles)

Though the starting point was random you will most likely see that the QNG run achieved a greater degree of improvement – it will not perform worse –, and that the trajectories followed by angles there were different from those followed by angles in the sgd-only optimization. Feel free to play around with other methods, learning rates, or circuits in the space below!

### Use this space to optimize your own circuits!

The SciPy Optimizer

SciPy is one of the most popular optimization packages in Python. It offers a wide variety of optimization strategies. We will not cover them here; for a full exploration of all the SciPy methods, see their docs. Here, we will exhibit a few of the more powerful options available. Most SciPy keywords like method_options can just be passed to minimize directly in the same way as when using SciPy directly.

tq.show_available_optimizers(module="scipy")

We will try three different optimizers: COBYLA, which is gradient-free, L-BFGS-B, which employs gradients, and NEWTON-CG, which employs the Hessian.

print('As a reminder: we will optimize:')
tq.draw(O1, backend="qiskit")
init={'a':0.25,'b':0.25,'c':0.25,'d':0.25}

cobyla_result = tq.minimize(objective=O1,
                            method="cobyla",
                            initial_values=init,
                            tol=1.e-3, method_options={"gtol":1.e-3},
                            silent=True)

cobyla_result.history.plot('energies')
cobyla_result.history.plot('angles')
print('best energy with cobyla: ',cobyla_result.energy)
print('optimal angles with cobyla: ',cobyla_result.angles)
lb_result = tq.minimize(objective=O1,
                            method="l-bfgs-b",
                            initial_values=init,
                            tol=1.e-3, method_options={"gtol":1.e-3},
                            silent=True)

lb_result.history.plot('energies')
lb_result.history.plot('angles')
print('best energy with L-BFGS-B: ',lb_result.energy)
print('optimal angles with L-BFGS-B: ',lb_result.angles)
newton_result = tq.minimize(objective=O1,
                            method="newton-cg",
                            initial_values=init,
                            tol=1.e-3, method_options={"gtol":1.e-3},
                            silent=True)

newton_result.history.plot('energies')
newton_result.history.plot('angles')
print('best energy with NEWTON-CG: ',newton_result.energy)
print('optimal angles with NEWTON-CG: ',newton_result.angles)

All three of the methods converged to the same minimum, but not necessarily to the same angles; the gradient and hessian based methods converged to approximately the same angles in similar time.

Scipy Extras: numerical gradients and Hessians Scipy allows for the use of numerical gradients. To use them, pass down keywords to the gradient argument, like '2-point'. When using the numerical gradients of SciPy it is often crucial to determine a feasible stepsize for the procedure. This can be done with the method_options entry finite_diff_rel_step (for SciPy version 1.5 or higher) or eps (for SciPy version < 1.5).

Here is one example. Please check your SciPy version!

lb_result = tq.minimize(objective=O1,
                            method="l-bfgs-b",
                            initial_values=init,
                            gradient="2-point",
                            tol=1.e-3, method_options={"gtol":1.e-3, "finite_diff_rel_step":1.e-4}, # eps for scipy version < 1.5
                            silent=True)

lb_result.history.plot('energies')
lb_result.history.plot('angles')
print('best energy with L-BFGS-B: ',lb_result.energy)
print('optimal angles with L-BFGS-B: ',lb_result.angles)

Scipy Extras: the QNG in SciPy Scipy is also configured to use the qng, just as the gd optimizer is. All one needs to do is set gradient=qng. Let’s See how QNG interacts with the BFGS optimizer. We will use 02, our 3-qubit expectationvalue, that we used previously.

init={k:np.random.uniform(-2,2) for k in ['a','b','c','d']}
lr=0.01
bfgs_qng_result = tq.minimize(objective=O2,
                     gradient='qng',
                     method='bfgs', maxiter=200,lr=lr,
                     initial_values=init, silent=True)
print('plotting what happens with QNG')
bfgs_qng_result.history.plot('energies')
bfgs_qng_result.history.plot('angles')
print('best energy with qng: ',bfgs_qng_result.energy)
print('optimal angles with qng: ',bfgs_qng_result.angles)
bfgs_noqng_result = tq.minimize(objective=O2,
                     gradient=None,
                     method='bfgs', maxiter=200,lr=lr,
                     initial_values=init, silent=True)
print('plotting what happens without QNG')
bfgs_noqng_result.history.plot('energies')
bfgs_noqng_result.history.plot('angles')
print('best energy without qng: ',bfgs_noqng_result.energy)
print('optimal angles without qng: ',bfgs_noqng_result.angles)

Numerical and Customized Gradients

By default tequila compiles analytical gradients of the objectives using jax, internal recompilation and the parameter shift rule. The default is not setting the gradient keyword or setting it to None. The keyword can also be set to a dictionary (keys are the variables, values are the tequila objectives which are assumed to evaluate to the corresponding gradients of the objective).
For example gradient=tq.grad(objective) will results have the same results as gradient=None or simply not setting it.

tequila offers its own way of compiling numerical gradients which can then be used troughout all gradient based optimizers. It can be activated by setting gradient to a dictionary holding the finite difference stencil as method as well as the stepsize.

Numerical gradients of that type come with the cost of 2*len(variables) and can lead to significantly cheaper gradients, especially if many expectation values are involved in the objective and/or if heavy recompilation of parametrized gates is necessary. Here is a small example using our O2 objective, here the numerical 2-point procedure leads to 4 expectation values in the gradients (while anayltial gradients would lead to 8, set silent to False in the upper example or remove the gradient statement here).

lr=0.01
num_result = tq.minimize(objective=O2,
                     gradient={"method":"2-point", "stepsize":1.e-4},
                     method='sgd', maxiter=200,lr=lr,
                     initial_values=0.1, silent=False)
num_result.history.plot('energies')

tequila currently offers 2-point as well as 2-point-forward and 2-point-backward stencils as method. The method can also be set to a python function performing the task. Here is an example which implements the same as 2-point. The function can be replaced by any function with the same signature.

import copy
def my_finite_difference_stencil(obj, var_vals, key, step, *args, **kwargs):
        """
        calculate objective gradient by symmetric shifts about a point.
        Parameters
        ----------
        obj: Objective:
            objective to call.
        var_vals:
            variables to feed to the objective.
        key:
            which variable to shift, i.e, which variable's gradient is being called.
        step:
            the size of the shift; a small float.
        args
        kwargs

        Returns
        -------
        float:
            the approximated gradient of obj w.r.t variable key at point var_vals[key] as a float.

        """
        left = copy.deepcopy(var_vals)
        left[key] += step / 2
        right = copy.deepcopy(var_vals)
        right[key] -= step / 2
        return 1.0 / step * (obj(left, *args, **kwargs) - obj(right, *args, **kwargs))

num_result = tq.minimize(objective=O2,
                     gradient={"method":my_finite_difference_stencil, "stepsize":1.e-4},
                     method='sgd', maxiter=200,lr=lr,
                     initial_values=0.1, silent=True)
num_result.history.plot('energies')

The gradient keyword can also be replaced by a dictionary of tequila objectives which evaluate to gradients approximations of it.

Bayesian optimization

Bayesian optimization is a method of global optimization, often used to tune hyperparameters in classical learning. It has also seen use in the optimization of quantum circuits. Tequila currently supports 2 different bayesian optimization algorithms: Phoenics and GPyOpt, optimizers originally developed for optimizing expensive experimental procedures in chemistry. Click the links to get to the respective github pages, and download the optimizers before continuing this tutorial.

GPyOpt

GPyOpt can be used like any of our other optimizers. Like the GD and SciPy optimizers, it also takes a ‘method’ keyword. 3 methods are supported: 'lbfgs','DIRECT', and 'CMA'. See the GPyOpt github for more info.

print('As a reminder, we will optimize')
tq.draw(O1,backend='qiskit')
### let's use the lbfgs method.
init={'a':0.25,'b':0.25,'c':0.25,'d':0.25}
### note: no lr is passed here! there are fewer tunable keywords for this optimizer.
result=tq.minimize(objective=O1,
              method='lbfgs',
              maxiter=80,
              initial_values=init)

print('GPyOpt optimization results:')
result.history.plot('energies')
result.history.plot('angles')
print('best energy: ',result.energy)
print('optimal angles: ',result.angles)

Don’t worry, the plot’s not broken! Perhaps you are looking at the plots above in horror. But, do take note: bayesian optimization is a global, exploratory optimization method, designed to explore large portions of parameter space while still seeking out optimality. Look at the optimal energy again, and one sees that the best performance of this optimization method matched that of all the gradient descent methods. We will apply gpyopt, next, to the QNG example circuit above, and see how bayesian optimization compares to QNG and SGD.

print('Hey, remember me?')
tq.draw(O2)
### the keyword stop_count, below, stops optimization if no improvement occurs after 50 epochs.
### let's use a random initial starting point:
init={k:np.random.uniform(-2,2) for k in ['a','b','c','d']}

gpy_result = tq.minimize(objective=O2,maxiter=25,method='lbfgs',
                     initial_values=init)

gpy_result.history.plot('energies')
print('best energy: ',gpy_result.energy)
print('optimal angles: ',gpy_result.angles)

In a very, very small number of steps, GPyOpt is able to match the performance of SGD with the QNG.

There’s a few extras you can access if you are well-familiar with GPyOpt. We return as part of result an attribute result.gpyopt_instance, which is an instance of the native GPyOpt BayesianOptimization object – the one built and run during your optimization. It has some plotting features you can use.

obj=gpy_result.gpyopt_instance
obj.plot_convergence()

If your function has 1 or 2 parameters (but no more) you can also see a plot of its acquisition function! see here for more info!

You can also extract the acquisition function of the model itself, and play with it (it takes ones object, an np array, as input), using:

acq=result.gpyopt_instance.acquisition.acquisition_function

Phoenics

Finally, we turn to `Phoenics <https://github.com/aspuru-guzik-group/phoenics>`__. This algorithm, originally developed within the Aspuru-Guzik group, can be accessed in the usual fashion. It’s performance on the two-qubit optimization circuit is shown below. Note that the number of datapoints exceeds the provided maxiter; maxiter here controls the number of parameter batches suggested by phoenics. phoenics suggests a number of parameter sets to try out, per batch, that scales with the number of parameters (in a nonlinear fashion), so you may want to set maxiter lower if you are only playing around.

init={'a':0.25,'b':0.25,'c':0.25,'d':0.25}
print('With phoenics we will optimize:')
print(O1)
### to see what you can pass down to phoenics, see the tequila documentation for that module.
p_result=tq.minimize(objective=O1,
            method='Phoenics',
            maxiter=5,
            initial_values=init,
            silent=False)

print('Phoenics optimization results on 2 qubit circuit:')
p_result.history.plot('energies')
p_result.history.plot('angles')
print('best energy: ',p_result.energy)
print('optimal angles: ',p_result.angles)

We also have returned to you the phoenics object. One interesting object we can extract from this is the acquisition function. You can obtain this indirectly, using resut.object.bayesian_network.kernel_contribution. This function takes a numpy array ( a point in your parameter space) and returns 2 numbers, x and y; the acquisition function then has the value x*y. Note: this is often zero.

kc=p_result.phoenics_instance.bayesian_network.kernel_contribution
random_point=np.random.uniform(0,2*np.pi,4)
f,s=kc(random_point)
random_ac=f*s
print('random point ', random_point, ' has acquisition function value ',random_ac)

This concludes our tutorial. Hope you had fun! Happy optimizing!