Using adjoint for optimization

At a Glance

Questions                 |Objectives                     |Key Points
--------------------------|-------------------------------|-------------------------------------
How can gradients be      |Know PETSc/TAO's capability for|Adjoint enables dynamic
computed for simulations? |adjoint and optimization       |constrained optimization.
                          |                               |
How difficult is it to    |Understand ingredients needed  |Jacobian is imperative.
use the adjoint method?   |for adjoint calculation        |
                          |                               |
                          |Understand the concern of      |Performance may depend on
                          |checkpointing                  |checkpointing at large scale.

Note: To begin this lesson…

cd handson/adjoint

Example 1: Generator Stability Analysis:

This code uses PETSc/TAO to demonstrates how to solve an ODE-constrained optimization problem with the Toolkit for Advanced Optimization (TAO), TSEvent, TSAdjoint and TS. The objective is to maximize the mechanical power input subject to the generator swing equations and a constraint on the maximum rotor angle deviation, which is reformulated as a minimization problem

$\begin{align*} \min & \{-P_m + \sigma\displaystyle \int_{t_0}^{t_F} \max\left(0, \theta - \theta_{max}\right)^\eta \ \mathrm{d}t \}\\ \nonumber ~~ \text{s.t.} & \qquad \frac{d \theta}{dt} = \omega_B\left(\omega - \omega_s\right) \\ & \qquad \frac{d \omega}{dt} = \frac{\omega_s}{2H}\left(P_m - P_{max}\sin(\theta) - D(\omega - \omega_s)\right) \end{align*}$

Disturbance (a fault) is applied to the generator at time 0.1 and cleared at time 0.2. The objective function contains an integral function. The gradient is computed with the discrete adjoint of an implicit time stepping method (Crank-Nicolson).

Compile the code

During ATPESC, participants do not need to compile code because binaries are available in the ATPESC project folder on Cooley. In case you are using your own copy of PETSc, this example is located in src/ts/examples/power_grid/. To compile, run the following in the source folder

make ex3opt

The source code is included in ex3opt.c

All the example codes need to compiled only once. Different tasks can be accomplished using command line options.

Command line options

You can determine the command line options available for this particular example by doing

./ex3opt -help

and show the options related to TAO only by doing

./ex3opt -help | grep tao

Run 1: Monitor the optimization progress

./ex3opt -tao_monitor -tao_view
iter =   0, Function value: 2.03778,  Residual: 144.125
iter =   1, Function value: -0.552947,  Residual: 43.1456
iter =   2, Function value: -0.911654,  Residual: 18.3028
iter =   3, Function value: -1.00401,  Residual: 2.48745
iter =   4, Function value: -1.00649,  Residual: 1.17916
iter =   5, Function value: -1.00732,  Residual: 0.125532
iter =   6, Function value: -1.00733,  Residual: 0.00012392
iter =   7, Function value: -1.00733,  Residual: 1.3024e-08
iter =   8, Function value: -1.00733,  Residual: 3.46501e-12
Tao Object: 1 MPI processes
type: blmvm
Gradient steps: 0
TaoLineSearch Object: 1 MPI processes
type: more-thuente
Active Set subset type: subvec
convergence tolerances: gatol=1e-08,   steptol=0.,   gttol=0.
Residual in Function/Gradient:=3.46501e-12
Objective value=-1.00733
total number of iterations=8,                          (max: 2000)
total number of function/gradient evaluations=9,      (max: 4000)
Solution converged:    ||g(X)|| <= gatol
Vec Object: 1 MPI processes
type: seq
1.00793

Questions

Examine the source code and find the user-provided functions for TAO, TS, and TSAdjoint respectively.

Essential functions we have provided are FormFunctionGradient for TAO, TSIFunction and TSIJacobian for TS, RHSJacobianP for TSAdjoint. Because of the integral in the objective function, extra functions including CostIntegrand, DRDYFunction and DRDPFunction are given to TSAdjoint.

Further information

A more complicated example for power grid application is in src/ts/examples/power_grid/stability_9bus/ex9busopt.c.

Example 2: Hybrid Dynamical System:

This code demonstrates how to compute the adjoint sensitivity for a complex dynamical system involving discontinuities with TSEvent, TSAdjoint and TS. The dynamics are described by the ODE

$\dot{x} = A_i x$

where $x = [x_1, x_2]^T$ and the matrix A change from

$A_1 = \left[ \begin{array}{c c}1 &-100\\ 10 &1 \end{array} \right] \quad \text{to} \quad A_2 = \left[ \begin{array}{c c}1 &10\\ -100 &1 \end{array} \right]$

when $x_2=2.75 x_1$ and switch back when $x_2=0.365 x_1$ .

Thus the ODE system alternates the right-hand side when a switching face is encountered. The switching surfaces are given by the algebraic constraints depending on the state variables, as shown below (left)

The parameter to which the sensitivities are computed is marked in red.
It represents the slope of the switching surface.
Intuitively the trajectory cannot be affected before it hits the surface.
The influence of the perturbation in the slope diminishes as the trajectory is approaching the equilibrium point.

Compile the code

This example is in src/ts/examples/hybrid. The source code is included in ex1adj.c

make ex1adj

Make the graghics work via interactive mode on cooley

Graphics is tricky. HPC users often do it offline. In order to make it work with cooley, your computer must have X11 (Mac users can install XQuartz). If you do not have it now, just skip the graphics parts since they are not essential.

Apply for an interactive allocation (skip this if you already got one)

$ qsub -I -t 60 -n 1 -A <project_name>

For example, if your interactive allocation gives you node cc115, open a new terminal and do the following:

$ ssh -C -X -Y cooley.alcf.anl.gov

$ ssh -X cc115

Then continue to run the applications in this new terminal.

Run 1: Monitor solution graphically with phase diagram

./ex1adj -ts_monitor_draw_solution_phase -4,-2,2,2 -draw_pause -2

Run 2: Monitor the timestepping process

./ex1adj -ts_monitor

Trailing (r) in some lines of the output indicates that a rollback happens. In this example, it is triggered by TSEvent. To check details about the event, we can use the event monitor

./ex1adj -ts_monitor -ts_event_monitor

We can also monitor the timestepping for the adjoint calculation by doing

./ex1adj -ts_monitor -ts_adjoint_monitor

Further information

The example ex1fwd.c in the same folder illustrates the forward sensitivity approach for the same problem.

Example 3: Diffusion-Reaction Problem

This code demonstrates parallel adjoint calculation for a system of time-dependent PDEs on a 2D rectangular grid. The adjoint solution corresponds to the sensitivities of one component in the final solution w.r.t. the initial conditions. We will use this example to illustrate the performance considerations for realistic large-scale applications. In particular, we will show how to play with checkpointing and how to profile/tune the performance.

Compile the code

This example is in src/ts/examples/advection-diffusion-reaction. The source code is included in ex5adj.c

make ex5adj

Run 1: Monitor solution graphically

mpiexec -n 4 ./ex5adj -forwardonly -implicitform 0 -ts_type rk \
                     -ts_monitor -ts_monitor_draw_solution

-forwardonly perform the forward simulation without doing adjoint
-implicitform 0 -ts_type rk changes the time stepping algorithm to a Runge-Kutta method
-ts_monitor_draw_solution monitors the progress for the solution at each time step
Add -draw_pause -2 if you want to pause at the end of simulation to see the plot

Run 2: Optimal checkpointing schedule

By default, the checkpoints are stored in binary files on disk. Of course, this may not be a good choice for large-scale applications running on high-performance machines where I/O cost is significant. We can make the solver use RAM for checkpointing and specify the maximum allowable checkpoints so that an optimal adjoint checkpointing schedule that minimizes the number of recomputations will be generated.

mpiexec -n 4 ./ex5adj -implicitform 0 -ts_type rk -ts_adapt_type none \
                     -ts_max_steps 10 -ts_monitor -ts_adjoint_monitor \
                     -ts_trajectory_type memory -ts_trajectory_max_cps_ram 3 \
                     -ts_trajectory_monitor -ts_trajectory_view

The output corresponds to the schedule depicted by the following diagram:

Questions

What will happen if we add the option -ts_trajectory_max_cps_disk 2 to specify there are two available slots for disk checkpoints?

Looking at the output, we will find that the new schedule uses both RAM and disk for checkpointing and takes two less recomputations.

Run 3: Implicit time integration method

Now we switch to an implicit method (Crank-Nicolson) using fixed stepsize, which is the default setting in the code. At each time step, a nonlinear system is solved by the PETSc nonlinear solver SNES.

mpiexec -n 12 ./ex5adj -da_grid_x 1024 -da_grid_y 1024 -ts_max_steps 10 -snes_monitor -log_view -ts_monitor

-snes_monitor shows the progress of SNES
-log_view prints a summary of the logging

A snippet of the summary:

...
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDot                20 1.0 2.7505e-02 1.7 7.00e+06 1.0 0.0e+00 0.0e+00 2.0e+01  0  0  0  0  2   0  0  0  0  2  3050
VecMDot              321 1.0 2.6292e+00 1.4 6.62e+08 1.0 0.0e+00 0.0e+00 3.2e+02 25 15  0  0 34  25 15  0  0 34  3017
VecNorm              401 1.0 7.1590e-01 1.9 1.40e+08 1.0 0.0e+00 0.0e+00 4.0e+02  7  3  0  0 42   7  3  0  0 42  2349
...

Questions

Where is the majority of CPU time spent?

Of course answer may vary depending on the settings such as number of procs, problem size, and solver options. Typically most of the time should be spent on [VecMDot](http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecMDot.html) or [MatMult](http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatMult.html)

How expensive is it to do an adjoint step?

For this particular run, an adjoint step takes about 60-70% of the running time of a forward step (compare the time between TSAdjointStep and TSStep).

How can we improve performance?

1. Use memory instead of disk for checkpointing(`-ts_trajectory_type memory -ts_trajectory_solution_only 0`); 2. Tune the time stepping solver, nonlinear solver, linear solver, preconditioner and so forth.

Further information

Because this example uses DMDA, Jacobian can be efficiently approxiated using finite difference with coloring. You can use the option -snes_fd_color to enable this feature.

Out-Brief

We have used PETSc to demonstrate the adjoint capability as an enabling technology for dynamic-constrained optimization. In particular, we focused on time-depdent problems including complex dynamical systems with discontinuities and a large scale hyperbolic PDE.

We have shown the basic usage of the adjoint solver as well as functionalities that can facilitate rapid development, diagnosis and performance profiling.

HandsOnLessons

A github pages site hosting hands-on lessons in the use, design and development of scientific computing software packages

Using adjoint for optimization

At a Glance

Example 1: Generator Stability Analysis:

Compile the code

Command line options

Run 1: Monitor the optimization progress

Questions

Further information

Example 2: Hybrid Dynamical System:

Compile the code

Make the graghics work via interactive mode on cooley

Run 1: Monitor solution graphically with phase diagram

Run 2: Monitor the timestepping process

Further information

Example 3: Diffusion-Reaction Problem

Compile the code

Run 1: Monitor solution graphically

Run 2: Optimal checkpointing schedule

Questions

Run 3: Implicit time integration method

Questions

Further information

Out-Brief

Further Reading