notes

Optimisation

Instructors - Selin, Xingyin

Course Overview

Component Weightage (%)
Midterms 32.5
Finals 32.5
Quiz 1 15
Quiz 2 15
Class participation 3
Course evaluation 2

Course Timetable

Week Content
1 Introduction, modeling with linear programming, geometry of linear programs. Reformulation tricks (absolute value, minimax, maximin). Canonical form of an LP and reduction to canonical form.
2 The simplex method in matrix form: simplex tableau, basic feasible solutions, reduced costs, pivoting. Phase I, Phase II. Anti-cycling rules.
3 Sensitivity analysis. Alternative optima.
4 Dual of a linear program, relationship to sensitivity analysis. Weak and strong duality theorem.
5 Two-player zero-sum games. The minimax theorem.
6 Introduction to networks and Dijkstra’s algorithm.
7 Recess week.
8 Max-flow / min-cut. Maximum matching. The min-cost flow problem.
9 Network simplex algorithm. Applications.
10 Integer programming formulations. Modeling with binary variables. LP relaxation.
11 Branch-and-bound. Cutting planes: knapsack covers, Gomory fractional cuts.
12 Dynamic programming: shortest paths, Bellman’s principle of optimality.
13 Traveling Salesman Problem
14 Review session. Final exam

Linear Programming Procedure

Procedure of solving a problem with linear programming.

1) Transform the problem into a linear program.

2) Transform the linear program into a standard linear program.

3) Ensure the standard linear program has linearly independent rows.

4) Obtain a canonical form of the linear program.

5) Obtain the optimal solution of the linear program.

6) Conduct sensitivity analysis and consider edge cases.

[0] Definitions

General” linear program (one if the objective function is linear and the constraints are linear equalities or inequalities).

\[\begin{align*} max/min &\quad \vec{c}^T\vec{x} \\ s.t. &\quad \vec{a}_i^T\vec{x} \enspace \geq \text{or} = \text{or} \leq \enspace b_i \quad \forall i \\ \end{align*}\]

Note that all inequalities are non-strict. Please transform all strict inequalities to non-strict inequalities.

Standard form of a linear program

\[\begin{align*} \max \quad \vec{c}^T \vec{x} \\ s.t. \quad A\vec{x} &= \vec{b} \geq \vec{0} \\ \vec{x} &\geq 0 \end{align*}\]

Canonical form of a linear program - variables and constraints of $x$ can permuted to represented in this way

\[\begin{array}{r@{}cl} \max \quad c_N x_N &+k \\ \quad x_B + A_N x_N &=& b \\ \quad x_B, x_N &\geq& 0 \end{array}\]

One basic feasible solution can be read off the canonical form, with $x_b = b$ and $x_N = 0$, with the objective value equal to $k$.

If $c_N$ is nonnegative, the basic feasible solution is an optimal solution.

[A] Converting linear residuals into a linear program

Please note that a linear program does not allow strict inequalities.

Least squares - cannot be converted into a linear program.

This includes the least-squares best-fit line.

Converting linear absolute residuals into a linear program.

The problem: Minimize $\Sigma_{i=1}^6 \vert \epsilon_i \vert$, where $\epsilon$ is a linear expression of decision variables (in this case, the intercept and slope of two variables).

Solution 1 Minimise $\Sigma_{i=1}^6 r_i^-, + r_i^+$

where

\[\epsilon = r_i^- - r_i^+\]

with

\[r_i^-, r_i^+ \geq 0\]

(The optimal solution will minimise such that $r_i^-$ and/or $r_i^+$ is zero.)

Solution 2 Minimise $\Sigma_{i=1}^6 z_i$ with

\[\begin{align} \epsilon_i &\leq z_i \\ -\epsilon_i &\leq z_i \end{align}\]

(I still don’t understand how this works.)

Converting maximum absolute residual into a linear program.

The problem: Minimize $\max(\vert \epsilon_i \vert \enspace \forall i)$

Solution

Minimise $r$.

For all $i$,

\[\begin{align} r &\geq \epsilon_i \\ r &\geq -\epsilon_i \end{align}\]

From the slides (notice that $\epsilon_i$ is an expression).

linear-regression

linear-regression-variants

The linear program finds the optimal values of $b_0$, $b_1$, $b_2$ and $r$(s) that optimises the objective function. $L_i$, $P_i$ and $E_i$ are constants given by the data.

[B] Converting a linear program into a standard linear program

All linear programs can be transformed into a standard linear program. Here are the steps.

[C] Simplex Phase I

PHASE I - Transforming the standard form to the canonical form \(\max\left\{ cx | A\vec{x} = \vec{b} \geq \vec{0}, \vec{x} \geq 0 \right\}\) $A$ is made up of $n$ linearly independent rows.

First, we add nonnegative $n$ slack variables to create a different LP problem with a different objective function.

Begin with a canonical form of the new LP problem by managing the objective function. Carry out the Simplex Phase II iteration. Results:

Remove the slack variables, and continue Phase II with the original objective function from the standard form.

[D] Simplex Phase II

PHASE II - Obtaining the optimal solution from the canonical form

We start with the canonical form.

We choose an entering basis variable (which is currently a non-basis variable).

We also choose an leaving basis variable (which will be a non-basis variable).

Then we pivot the basis variables. The LP problem should remain the same, and still in canonical form after every pivot. Then we iterate until we reach either one of the following conditions

[E] Interpretation of matrices

interpreting-matrices.png

simplex-matrices

[F] Calculate allowable change

calculate-allowable-change.png

sensitivity-report

[G] Consider edge cases

edge-cases

[I] Dual problem

Definitions

Given an “primal” LP (expressed in this form)

\[\begin{align} \max \enspace c^T x& \\ Ax& \leqslant b \\ x& \geqslant 0 \end{align}\]

we can formulate a dual LP

\[\begin{align} min \enspace b^T y& \\ A^{T} y& \geqslant c \\ y& \geqslant 0 \end{align}\]

The dual of the dual is the primal. (You need to express the dual form into a primal before you can calculate the dual of the dual.)

linear-programming-primal-dual

Complementary Slackness property

Let $\bar{x}$ and $\bar{y}$ be feasible solutions to the primal and dual problem, respectively. $\bar{x}$ and $\bar{y}$ are optimal solutions for the two respective problems if and only if:

\[\begin{align} \bar{y}_i (b_i - a_i^T \bar{x}) &= 0, \enspace \forall i, \quad \text{and} \\ (A_j^T \bar{y}_i - c_j) \bar{x}_j &= 0, \enspace \forall i \end{align}\]

[QUESTION] Do they imply one another?

In an optimal solution

This applies for all constraint-solution pair, and solution-constraint pair.

Weak Duality Theorem

Condition

Result

One of the intuitions - Suppose that $x^$ is optimal to the primal, then $x^$ is feasible to the relaxation.

Strong Duality Theoerm

Condition

Result

Fundamental Theorem of Linear Programming

For a pair of primal problem P and dual problem D, exactly one of the following is true

Formulating the dual of a general LP

linear-programming-dual-transformation

Economic interpretation

Please try to understand this.

linear-programming-dual-formulation.png

linear-programming-dual-formulation

Please also understand the optional component in W4L2 on restoring feasibility after adding a constraint.

Games

Two-player zero-sum game

zero-sum-game-two-player

Guaranteed payoff

To maximise guaranteed payoff, row player would play row 3.

zero-sum-game-guaranteed-payoff.png

zero-sum-game-guaranteed-payoff

Dominating strategies

zero-sum-game-dominating-strategies

Saddle point

When the game only have one outcome each player would not choose any other outcome.

zero-sum-game-saddle-point

[QUESTION] Is the saddle point always an intersection of dominating strategies?

Mixed strategy

Made up of a set of probailities. The row player can play different rows with a probability that does not change. Objective - to maximise guaranteed payoff. Intuition - the opponent will detect the strategy, and the guaranteed payoff will converge.

zero-sum-game-mixed-strategies

Solution:

\[p_1 = 7/18, \enspace p_2 = 5/8, \enspace p_3 = 1/3, \enspace z = 1/9\]

The linear programming problem for row player

Objective: maximise guaranteed payoff of the row player

\[\begin{align} \max \min\{E(C_1), E(C_2), E(C_3)\}& \\ p_1 + p_2 + p_3 &= 1 \\ p_1, p_2, p_3 &\geq 0 \end{align}\]

where

\[\begin{alignat*}{4} E(C_1) & {}={} &-2p_1 & {}+{} &2p_2 &{}+{} &p_3 \\ E(C_2) & {}={} & p_1 & {}-{} & p_2 &{} {} \\ E(C_3) & {}={} & 2p_1 & {} {} & &{}-{} &2p_3 \end{alignat*}\]

Formulating as a linear programming problem

\[\begin{align} \max z& \\ E(C_1) &\geq z\\ E(C_2) &\geq z\\ E(C_2) &\geq z\\ p_1 + p_2 + p_3 &= 1 \\ p_1, p_2, p_3 &\geq 0 \end{align}\]

(This is a relaxation because they are not exactly equivalent, because $z$ could be any value besides $E(C_1), E(C_2), E(C_3)$)

The linear programming problem for column player

Objective: minimise the guaranteed payoff of column player

\[\begin{align} \min \max\{E(R_1),E(R_2),E(R_3)\} \\ q_1 + q_2 + q_3 &= 1 \\ q_1, q_2, q_3 &\geq 0 \end{align}\]

where

\[\begin{alignat*}{4} E(R_1) & {}={} &-2q_1 & {}+{} &q_2 &{}+{} &2q_3 \\ E(R_2) & {}={} & 2q_1 & {}-{} &q_2 &{} {} \\ E(R_3) & {}={} & 2q_1 & {} {} & &{}-{} &2q_3 \end{alignat*}\]

Formulating as a linear programming problem

\[\begin{align}\min v& \\ E(R_1) &\leq v\\ E(R_2) &\leq v\\ E(R_2) &\leq v\\ q_1 + q_2 + q_3 &= 1 \\ q_1, q_2, q_3 &\geq 0 \end{align}\]

This is a primal-dual pair. ([TODO] To show)

Graphical solution to small games

If there are only two strategies available, you can use one variable ${p, 1-p}$ and you do not need to use an LP to solve the problem.

zero-sum-game-graphical-solution

You can use a graphical solution. The maximum is a piecewise function with a maxima at (0.375, -0.125).

[TODO] Explain for the column player as well.

Formulating games as LP problems

Each player can decide on a combination of decisions before the games commence. (Example from cohort - combinations of decisions of whether to pass or bet on head and tails). The strategy is to assign a probability to each combination.

Graphs

Directed graph and undirected graph

Differences between directed graph and undirected graph

Property Undirected Graph Directed Graph
Edges of nodes Degree Indegree, Outdegree
Adjacency Matrix Symmetric Not necessarily symmetric
Incident Matrix Columns sum to two Columns sum to zero

Terminologies

Special graphs

Theorems

Shortest path problem

The shortest path tree stores all shortest path from the starting node.

Three set of variables

Key steps of Dijkstra’s algorithm

Workings of Dijkstra’s algorithm

dijkstra-1

dijkstra-2

dijkstra-3

Taxonomy of network problems

Taxonomy of network problems (and methods)

The max-flow problem

Given a directed network $G = (V,E)$ and a source node $s$, a sink node $t$, and a possibly infinite capacity $u_{ij}$ for each arc $(i,j)$

The max-flow problem is the problem of finding a value of flow for every arc in such a way that

Formulation of the Linear Program

\[\begin{align} \text{Objective function}\\ \max& \ q \\ \text{The inflow of the source}\\ & \qquad \sum_{(s,j) \in E} x_{sj} &=& \ q\\ \text{The outflow of the tap}\\ & \qquad \sum_{(i,t) \in E} x_{it} &=& \ q\\ \text{For all other nodes, inflow equals outflow}\\ \forall v \in V, v \neq s, t & \qquad \sum_{(v,j) \in E} x_{vj} - \sum_{(i,v) \in E} x_{iv} &=& \ 0\\ \text{Capacity constraint}\\ \forall (i,j) \in E& \qquad x_{ij} &\leq& \ u_{ij}\\ \text{Nonnegativity constraint}\\ \forall (i,j) \in E& \qquad x_{ij} &\geq& \ 0 \end{align}\]

Transforming into a max-flow problem

Ford-Fulkerson algorithm for max-flow

Problem parameters

These are the numbers that you need to keep track of

Initialise the current flow value as zero

Iterate with the following steps

Terminate the algorithm when there are no more directed paths from the source to the tap. The complexity of the algorithm depends on the choice of augmenting paths.

The backflow capacity needs to be decreased if used (from 10 to 9 in the example below).

backflow.png

max-flow-backflow

My understanding of the intuition

min-cut

The backflow is ignored in the calculation of the value of the min-cut.

Each feasible solution of the dual problem provides an upper bound to the primal problem.

Natuaral bounds - if you partition right outside the source and the tap, you get trivial upper bounds. It is not possible to send any more than the source can send, or any more than the sink can take.

The optimal solutions for the primal and dual problems is the same.

Primal-dual pair for max-flow problem

The dual of max-flow problem is the min cut problem. Assigning the $d$-values provides a partition.

max-flow-primal-dual-graph

max-flow-primal-fomulation

Wikipedia offers good explanation https://en.wikipedia.org/wiki/Max-flow_min-cut_theorem Symbols differ.

max-flow-dual-formulation

max-flow-primal-dual-table

Min-cost flow problem

This is a more general form of a max-flow problem, but still a specific class of linear programming problem (refer to taxonomy)

min-cost-flow-definition

For each node the outflow minus inflow equal $b_i$. If $b_i > 0$, node $i$ is a supply node, if $b_i$ is a demand node, node $i$ is a demand node.

min-cost-flow-example

min-cost-flow-formulation

Properties

Transforming into a min-cost problem In the min-cost problem total demand should be equal to the total supply $\sum_i b_i = 0$, or else there this is not a min-cost problem (or no feasible solution). Therefore

Converting a max-flow problem into a min-cost problem

Every max-flow problem can be converted into a min-cost problem (refer to taxonomy of linear programming problems).

Add an arc from the sink to the source with infinite capacity. The objective function is to minimise the negative of the flow of the new sink-source arc. The cost of the rest of the arcs are zero.

Network simplex

A tree is a graph with no cycles. It can be directed or undirected, but we consider undirected trees.

A spanning tree of a connected graph is a subset of its arcs that forms an undirected tree touching every node. (Directions does not matter here)

The basic feasible solution of the LP formulation of a min-cost flow problem correspond to a spanning tree.

The basic feasible solution is a spanning tree that allows all supplies or demands to be satisfied.

min-cost-flow-bfs

The capacity limits $u_{ij}$ will not be considered in this course.

Problem parameters

These are the numbers that you need to keep track of

Algorithm

Initialising the network simplex

You start with a basic feasible solution a minimum spanning tree that allows all supplies and demands to be satisfied.

Compute the simplex multipliers starting from a leaf (chosen at random or instructed). The leaf (which is a node) is set to zero.

Recurse until solution is optimal

Compute the reduced costs. If all reduced cost are nonnegative, the solution is optimal.

Pivot to improve the BFS. “We try to send as much flow as possible along that arc, while staying feasible”. In other words, you redirect the flow to the nonbasic arc, from the (series of) basic arcs, until one of the flow of the basic arc is zero.

Update the simplex multipliers. You can repeat the same process, or use the following faster procedure. Then, repeat the cycle.

min-cost-flow-fast-update-procedure

Assumptions

Screenshot 2019-11-18 at 12.43.51 AM

Phase I of network simplex

Do this to obtain a basic feasible solution for the network simplex.

min-cost-flow-phase-i

(Following comment needs to be confirmed, I am not too sure.)

The flow on the actual arcs is zero, and the flow on the artificial arcs is equal to the demand or supply of the node.

To obtain a basic feasible solution, we want to redirect the flow on the artificial arcs to the actual arcs. We set the costs on the artificial arcs to one, and the costs on the actual arcs to zero.

If you cannot reduce the cost on this modified problem to zero, you cannot obtain a minimum spanning tree from the actual arcs, and there is no feasible solution.

Comparison between simplex and network simplex

simplex-and-network-simplex-1

simplex-and-network-simplex-2

Integer programming

Memorising the solutions to each problem does not scale. You need to understand the design priniciple behind every solution.

Example - flipping the board of five. Takeaways

Example - IKEA with setup costs. Takeaways

Modelling ILP problem

Linear programming can only solve convex problems, with inclusive boundaries, and you cannot multiply variables.

Strategy - transform the problem into ORs and ANDs.

Modelling less-than-or-greater-than (non-convex) inequality

\[x\leq2 \quad \text{or} \quad x \geq 6\]

You can turn off either constraint by adding a binary variable $w_i$ with a big $M$.

\[x \leq 2 + M w \quad \text{and} \quad x \geq 6 - M (1-w) \quad \text{and} \quad w \text{ binary}\]

Consider the geometry of the formulation.

integer-programming-geometry

The yellow region is convex.

Modelling with complement constraint

One and only one of the constraint and its complement can be fulfilled.

If the variables and coefficient are integers, we can formulate a complement. For the constraint $x_1 + 2x_2 \leq 10$, the violation is $x_1 + 2x_2 \geq 11$. (We cannot use strict inequalities for linear programming).

integer-programming-complement-constraint

Implication statement into logical constraint

If $P \implies Q$, $\overline{P} \or Q$ must be true. $\overline{P}$ is the complement of $P$.

integer-programming-piecewise-constraint

Piecewise linear functions

This is more for objective function, though it is possible for the constraint.

piecewise-linear-functions

integer-programming-piecewise

$x$ is replicated into three parts. $x_i$ will be set to zero if we are not using the i-th piece. $w_i$ indicates whether are we using the i-th piece.

LP relaxations to solve IP

In this approach, we solve the IP by solving its LP relaxations. The relaxation is tighted with new constraints until we obtain the solution for the IP.

LP relaxation

You solve the LP relaxation. If the optimum solution is not all integers, you find valid inequalities to make the corner points integer.

Knapsack cover inequality

You observe a subset $S$ with $\vert S \vert$ elements such that its weights is larger than the limit. The knapsack cover inequality prevents you from choosing all of these elements, you can choose at most $\vert S \vert - 1$ elements from $S$.

\[\sum_{j \in S} x_j \leq \vert S \vert - 1\]

A minimal cover is a cover that $\vert S \vert - 1$ of its elements has a weight below the limit. Using minimal covers is usually useful.

Gomory cuts

Suppose all variables are integers. You have a equality constraint with deciment coefficient and RHS.

Branch-and-Bound

(incomplete and may be inaccurate)

For a (minimising) binary/integer program with $n$ variables, the brute-force or full enumeration takes $2^n$ computations. We do not want to do all of them.

Key idea - prune the tree

Properties

branch-and-bound-algorithm.png

Dynamic programming

Elements of a Dynamic Program

Essentially, you need to be able to calculate for all the states in one direction until you get your required answer.

Construction

You need to define the stages and the states. This breaks down the problem into subproblems.

Each stage $k$ consists of all the states. Each stage may only have one state (126 match game) or many states (shortest path - the current node). At each stage, each state has a value $f_k(j)$ (126 match game - win or lose, shortest path - cost to reach)

You use the past stage(s) to compute the next stage. This is the recursion.

Your program needs to stop when you have obtained the answer, you need to specify a stopping condition. The problem may be unbounded and your program need to be able to detect and return that.

Preflight checklist

Please provide the mathematical expression to compute the next stage.

Please remember to define the boundary conditions of all states.

126 Match game

Each stage $k$ is the number of games played.

There is only one state for each stage. So we not specify the state.

The value function $f(k)$ indicates whether player one will win if the game starts with $n$ matches.

The recursion $f(k) = 1-\min\lbrace 1, f_{k-1}, f_{k-2}, f_{k-6} \rbrace$

The boundary condition $f(1) = f(2) = f(4) = f(5) = f(6) = 1$ and $f(3) = 0$.

The stopping condition when $f(n)$ is computed

Shortest path

Each stage $k$ is maximum number of arcs allowed to each of the nodes.

Each state $j$ is the current node.

The value function $f_k(j)$ is the cost of the path from the starting node to the node ended.

The recursion $f_k(j) = \min \lbrace f_{k-1} (j), \min_{(i,j) \in A}\lbrace f_{k-1} (i) + c_ij \rbrace \rbrace$ You either do not move, or you move from another location and incur the cost.

The boundary condition $f_0(1) = 0$ and for other nodes $f_0(j) = \infty$

The stopping condition $f_k(j) = f_{k+1}(j)$ $\forall j$. If $k > 2J$ there is negative cycle.

Optimal capacity expansion

Each stage $k$ is the year.

Each state $j$ is the number of plants at the beginning of the year.

The value function $f_k(j)$ is minimum cost to adhere to the requirement from stage $k$ until the end of the planning horizon, if state $j$ plants has already been built.

The recursion $f_k(i) = \min_{j=1,2,3}{ f_{k+1}(i), 1.5 + c_k j + f_{k+1}(i+j) }$ but $\infty$ if $i < d_k$. You either do not build a plant, or you build a number of plants incurring fixed and variable cost.

The boundary condition $f_6(8) = 0$, $f_6(j) = \infty$ for $j \leq 7$.

The stopping condition if $f_0(0)$ has been computed. (This is an example of backward recursion.)

Travelling salesman problem

Find a tour in the graph that has the minimum cost.

Integer programming formulation

tsp-integer-programming-formulation

To eliminate all subtours you need $2^n$ constaints. At some $n$ you cannot even insert the constraints into your computer.

Subtour elimination

You solve without the subtour constraint initially. When the solution contains a subtour, add the subtour into the constraint and solve again. You might need to do this many times. Nevertheless each solution provides a lower bound to the solution. (Any feasible solution provides an upper bound to the problem.)

Dynamic programming

Assume that we know the best tour from one node to another, excluding certain nodes. This can be the basis for iteration.

We say that subtour $T$ dominates $T’$ if:

  1. $T$ and $T’$ begin at 1 and end at the same city $j$.
  2. The length of $T$ is at most the length of $T’$ .
  3. $T$ and $T’$ visit the same set of cities.

tsp-dynamic-programming.png

Heuristic approaches

A heuristic is an algorithm that will find some solution that may not be optimal.

Provable guarantees

Method Runtime $k$ $D$
Nearest Neighbour $O(n^2)$ NA 1
Greedy $O(n^2 log(n))$ NA 1
Insertion $O(n^2)$ NA $(n-2)!$
Christofiedes ? 2* 1
Local search (2-Opt) $O(n^2)$ NA $(n-2)!$

Nearest Neighbour

Start from any node - this is random. (You can try all of them.) Then, always go to the node with smallest distance to the current node and that has not been previously visited. After visiting all nodes, go back to the starting node.

Greedy Algorithm

You start with the cheapest arc, and go to the next cheapest node that do not create a small cycle, until all nodes is visited.

Insertion Algorithm

Randomly start with two nodes, you have a 2-node subtour. You randomly choose a node to insert. You make a 3-node subtour by finding the best edge to replace. Repeat until you cover the entire map.

Christofiedes Algorithm

Start with the minimum spanning tree. Then do magic.

Assumes triangle inequality. (Triangle inequality do not hold in real life, however. A detour may be shorted than a direct path.)

Local search Algorithms (2-Opt)

Find a pair of edges that will have a lower cost when swapped. Repeat until you cannot find such a pair.

This improves any current solution. This algorithm makes it harder for observers to point out an easier solution visually.