0% found this document useful (0 votes)
109 views14 pages

Dynamic Programming and Linear Programming

programa dinamica
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views14 pages

Dynamic Programming and Linear Programming

programa dinamica
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

See discussions, stats, and author profiles for this publication at: [Link]

net/publication/227988178

Dynamic Programming Via Linear Programming

Chapter · February 2011


DOI: 10.1002/9780470400531.eorms0277

CITATIONS READS
16 16,864

1 author:

Esra Buyuktahtakin
Virginia Tech (Virginia Polytechnic Institute and State University)
67 PUBLICATIONS 1,110 CITATIONS

SEE PROFILE

All content following this page was uploaded by Esra Buyuktahtakin on 17 December 2017.

The user has requested enhancement of the downloaded file.


Dynamic Programming via Linear Programming

İ. Esra Büyüktahtakın

Systems and Industrial Engineering, University of Arizona

1127 E James E. Rogers Way, Tucson, AZ, 85721

esra@[Link]

Abstract

Dynamic programming (DP) has been used to solve a wide range of optimization

problems. Given that dynamic programs can be equivalently formulated as linear pro-

grams, linear programming (LP) offers an efficient alternative to the functional equa-

tion approach in solving such problems. LP is also utilized with DP to characterize the

polyhedral structure of discrete optimization problems. In this paper, we investigate

the close relationship between the two traditionally distinct areas of dynamic program-

ming and linear programming.

To cite this paper: İ. E. Büyüktahtakın. Dynamic Programming Via Linear Pro-

gramming. In J. J. Cochran and L. A. Cox, Jr. and P. Keskinocak and J. P. Kharoufeh

and J. C. Smith, Wiley Encyclopedia of Operations Research and Management Science.

John Wiley & Sons, Hoboken, NJ, 2011.

1 Related Literature

A wide variety of problems have been shown to be polynomially solvable via dynamic pro-
gramming recursive formulations. The problem is attacked by decomposing it into a sequence

1
of interrelated subproblems defined by a recursive function and starting to solve the smallest
subproblem. The problem is then enlarged by finding the current optimal solution from the
preceding subproblem until the original problem is solved in its entirety. The main strength
of this approach is to offer treatments to different type of problems involving sequential
decision-making with discrete, nonlinear, and stochastic characteristics.
Dynamic programming was introduced to solve Markov decision processes by Bellman
[1]. The basics of a dynamic program can be described as follows: Consider a system being
observed over a finite or infinite time horizon divided into periods or stages. At each stage,
a decision or an action updates the state to be observed at the next stage, and depending on
the state and the decision made, an immediate reward (cost) is observed. The value function
represents the expected total reward (cost) from the current stage through the end of the
planning horizon, while the functional equation expresses the relationship between the value
function at the present stage and the successive stage. Optimal decisions depending on stage
and state are determined backwards iteratively by maximizing (minimizing) the right hand
side of the functional equation.
The use of linear programming to solve dynamic programming formulations was first
introduced by D’epenoux [11] and Manne [21]. Manne [21] studies an undiscounted Markov
decision model with an infinite planning horizon. He specifically analyzes an inventory con-
trol problem where the “state” variable represents the initial stock level i and the “decision
variable” corresponds to the order quantity j. In the linear program in Manne [21], i and
j are introduced as subscripts to the variables xij . These variables xij represent the joint
probabilities with which the initial stock equals i and the production quantity equals j.
The steady state probabilities of inventory, production, and shortage levels are then derived.
The constraints of the LP are the requirements regarding steady state probabilities, and
the objective function is the minimization of the expected cost corresponding to the steady
state probabilities. D’epenoux [11] provides a linear program for the discounted version of
the problem in [21] by linearizing the functional equations of the corresponding dynamic

2
program.
Howard [17] combines dynamic programming with Markov chain theory to develop Markov
decision processes. He also contributes to the solution of infinite horizon problems by de-
veloping the policy iteration method as an alternative to the backward induction method
of Bellman [1], which is known as value iteration. The policy iteration algorithm generates
a sequence of stationary policies by evaluating and improving the policies until the optimal
policy is obtained. Later, Osaki and Mine [26] formulate a semi-Markovian decision pro-
cess as an LP and show that the dual of this LP is equivalent to dynamic programming
formulation of Howard [17].
Another group of dynamic programming and Markov decision processes research arise
from the observation that finding the optimal cost function can be cast as a linear program-
ming problem ([3], [4], [9], [10], [16], [21], [29]). However the resulting linear program suffers
from the curse of dimensionality: as the problem size linearly increases, the size of the state
space grows exponentially. This LP has as many decision variables as states in the Markov
decision processes, and an even greater number of constraints. This difficulty is addressed by
approximate dynamic programming (ADP), where dynamic programing optimal cost-to-go
function of the problem is approximated within the span of some pre-specified set of basis
functions as introduced by Schweitzer and Seidmann [27]. de Farias and Van Roy analyze
and further develop this approach by proposing the procedure known as approximate linear
programming (ALP); that reduces the dimensionality of the linear program by utilizing a
linear combination of basis functions combined with sampling only a small subset of the
constraints ([6], [7], [8]). de Farias and Van Roy [7] also establish strong approximation
guarantees for ALP based approximations assuming knowledge of a Lyapunov-like function
that must be included in the basis. An extension to the ALP approach that automatically
generates the basis functions in the linear framework was developed by Valenti et al. [30].
Later Desai et al. [12] study the ADP via a smoothed linear program and develop an error
bound that characterizes the quality of approximations produced by ADP.

3
Eppen and Martin [14] and Martin et al. [22] study the underlying DP network structure
to reformulate difficult integer programs into new models that have better bounds and solve
more quickly. Eppen and Martin [14] provide tighter mixed integer programming formula-
tions for the single and multi-item lot-sizing problems using a variable redefinition approach.
They first remove the capacity constraints from the traditional lot-sizing formulation and
represent the subproblem with the dynamic programming network structure. This shortest
path network can be written as an integer linear program (IP), with the arcs correspond-
ing to binary variables and the nodes corresponding to flow balance constraints. Eppen
and Martin then relate the variables of the traditional model to the new set of variables
through a linear transformation. By using this transformation, they insert the complicating
constraints in terms of the new variables into the new formulation. Although this new refor-
mulation has a greater number of variables and constraints, it has a tighter LP relaxation
lower bound leading to reduced solution times. Martin et al. [22] formulate polynomially
solvable optimization problems as shortest path problems by using dynamic programming.
They then represent the dynamic program as an LP having a polynomial number of variables
and constraints. The extreme points of this LP are represented by the solution vectors of
the DP, and the dual of the LP provides the DP formulation. They also show that with
an appropriate change of variables, the LP formulation obtained from the DP provides a
polyhedral description of the model considered.
There are a number of studies that utilize dynamic programming algorithms to formalize
and solve integer linear programming problems (e.g. [2], [13], [19], [25], [28], [31]). In a
recent study, Hartman et al. [15] introduce a set of dynamic programming-based inequal-
ities that can be used to augment the capacitated lot-sizing integer linear programming
(CLSP) formulation. These authors utilize iterative solutions of forward dynamic program-
ming formulations for CLSP to generate inequalities for an equivalent integer programming
formulation. The inequalities capture convex and concave envelopes of intermediate-stage
value functions, and can be lifted by examining potential state information at future stages.

4
Lawler and Wood [20] discuss the close relationship between branch and bound and dynamic
programming, while Morin and Marsten [24] study branch and bound techniques to reduce
storage and computational requirements in discrete dynamic programming. In particular,
they utilize relaxations and fathoming criteria in branch and bound to eliminate states of
DP which will not lead optimal policies.
The paper is outlined as follows. In Section 2, we review the close relationship between
linear programming techniques and dynamic programming in stochastic control, while we
discuss the utilization of dynamic programming driven acyclic decision graphs to describe
polyhedral characteristics of discrete optimization problems in Section 3. Section 4 concludes
the paper and offers directions for future research.

2 LP-DP Relationship in Stochastic Control

In this section, we consider a discrete-time stochastic control problem involving a finite state
space S = (1, . . . , n). Let U be a finite decision set for all i ∈ S. Given state i, the use
of decision u ∈ U specifies the transition probability pij (u) to the next stage j, and a cost
g(i, u) is incurred when a decision u is taken in state i. Future costs are discounted by a
factor α ∈ (0, 1). A policy of the stochastic control problem is denoted by µ : S → U. The
problem is then to minimize the so-called cost-to-go function Jµ : S → U over the set of
admissible policies P:
" ∞
#
X
min Jµ (i0 ) = min E αk g(ik , µ(ik )) , (1)
µ∈P µ∈P
k=0

where i0 ∈ S is an initial state and the expectation E is taken over the possible future states
{i1 , i2 , . . .}, given i0 and the policy µ.
The optimal cost associated with the optimal policy µ∗ , denoted by J ∗ = Jµ∗ , satisfies
the Bellman equation:
" #
X
J(i) = min g(i, u) + α pij (u)J(j) , ∀i ∈ S, (2)
u∈U
j∈S

5
which is called value iteration and is a principal method for calculating the optimal cost
function J ∗ [4]. Once J ∗ is found by solving (2), the optimal policy µ∗ can be computed.
An alternative approach to solving the stochastic control problem described here is to fix
the policy µ and then to solve the following linear system:
X
Jµ (i) = g(i, µ(i)) + α pij (µ(i))Jµ (j) ∀i ∈ S. (3)
j∈S

Solving (3) is called as policy evaluation yielding Jµ , the cost-to-go of the fixed policy µ.
After computing the policy’s cost-to-go function, a better policy can be constructed by
performing a policy improvement step. The policy iteration method repeatedly performs
policy evaluation followed by policy improvement. This procedure generates a sequence of
policies that is guaranteed to converge to the optimal policy µ∗ after a finite number of
iterations since the new policy must be strictly better than the previous policy and there is
only a finite number of possible policies in total [17].

2.1 LP Formulation of DP

Suppose that we use value iteration to generate a sequence of vectors Jk = (Jk (1), . . . , Jk (n))
given an initial condition vector J0 = (J0 (1), . . . , J0 (n)). Then the following constraints:
X
J(i) ≤ g(i, u) + α pij (u)J(j), ∀i ∈ S, u ∈ U, (4)
j∈S

form a polyhedron in Rn . The optimal cost vector J ∗ = (J ∗ (1), . . . , J ∗ (n)) solves the follow-
ing problem (in w1 , . . . , wn )
LP1:
X
max wi (5)
i∈S
X
subject to: wi ≤ g(i, u) + α pij (u)wj , ∀i ∈ S, u ∈ U. (6)
j∈S

This is a linear program with n variables and as many as n × q constraints, where q is the
maximum number of elements of the set U. For very large n and q, the linear program can
be solved by the use of specialized, large-scale linear programming algorithms ([4], [29]).

6
2.2 Dual and Policy Iteration

We now investigate the dual of LP1 which is shown to be the same as policy iteration ([10]).
Duality theory of linear programming (see e.g., [5]) asserts that the following dual linear
program:
Dual1:
XX
min q(i, u)g(i, u) (7)
i∈S u∈U

X XX
subject to: q(i, u) − α q(j, u)pji (u) = 1, ∀i ∈ S, (8)
u∈U j∈S u∈U
q(i, u) ≥ 0, ∀i ∈ S, u ∈ U, (9)

has the same optimal value as LP1. The variables q(i, u), i ∈ S, u ∈ U, of the dual
program can be interpreted as the steady-state probabilities that state i will be visited at
the typical transition and then control u will then be applied. The constraints of Dual1
are the constraints that q(i, u) must satisfy in order to be feasible steady-state probabilities.
The cost function:
XX
q(i, u)g(i, u)
i∈S u∈U

is the steady state average cost per transition.


Denardo [10] shows that the feasible bases for Dual1 are in one-to-one correspondence
with the policies. Denardo also proves that the application of the simplex method to the dual
program performs the same sequence of pivots as does policy iteration and policy iteration
is the same as multiple substitution (block pivoting) in the dual simplex method, as applied
to LP1.

3 LP-DP Relationship through Acyclic Graphs

Many discrete optimization problems that are solvable through dynamic programming can
be represented by directed acyclic shortest path decision graphs (e.g. [10], [18], [19], [23]).

7
Given a finite state set S, vertices of the graph correspond to states of S reflecting the
transition phases in the solution of the underlying sequential decision process. Arcs (i, j)
between vertices i, j ∈ S represent decisions changing the state i to state j > i. Furthermore,
each arc is assigned a cost equal to the length of the arc. The problem is then to find the
shortest path from an initial state to a goal state. Generally, states are partitioned into
stages such that the decision at a stage updates the state at the stage into the state for the
next stage.
Once the shortest path formulation of a discrete optimization problem is constructed, the
shortest path problem can easily be written as a linear program where variables represent
arcs and the flow balance at each vertex is expressed as a constraint. This LP provides
the polyhedral characterization of the discrete optimization problem where every face of
the associated polytope contains the incidence vector of a decision path in the dynamic
programming graph. For any given cost function, one such incidence vector will be the
linear programming optimal solution [22].
One problem with the acyclic shortest path paradigm is that it is inadequate for more
complex discrete optimization problems since a typical decision involves composing two or
more partial solution elements into a single element. To overcome this difficulty, Martin et
al. [22] developed a directed acyclic decision hypergraph framework for deriving extended
formulations of combinatorial optimization problems from dynamic programming algorithms
as mentioned in Section 1. The dynamic programing algorithms considered in [22] search
hyperpaths in a directed hypergraph H = (S, H) on a finite state space S with cardinality
|S| = n, where the directed hyperarcs in H are of the form (K, j) with j ∈ S and K ⊆ S.
The states are ordered by a numbering function σ : S → {1, 2, . . . , m , |S|} such that
σ(i) < σ(j) for all (K, j) ∈ H and i ∈ K. Furthermore, it is assumed that the directed
hypergraph is single tailed. The set ST ⊂ S of boundary states is the set of all j ∈ S for
which there is a hyperarc (∅, j) ∈ H and all nonboundary states are called as intermediate.
Finally, there must be a finite reference set I(j) 6= ∅ for each state j ∈ S such that I(i) ⊆ I(j)

8
and I(i) ∩ I(i0 ) = ∅ for all i, i0 ∈ K where i 6= i0 is satisfied for all (K, j) ∈ H.
In this setting, Martin et al. [22] proved the convex hull of the characteristic vectors of
decision paths in the dynamic programming hypergraph is described by the following linear
program
LP2:
X
min c[K, j]z[K, j] (10)
(K,j)∈H

X
subject to: z[K, m] = 1, (11)
(K,m)∈H
X X
z[K, j̄] − z[K, j] = 0 for σ(j̄) = 1, . . . , m − 1, (12)
(K,j̄)∈H (K,j)∈Hwith j̄∈K

z[K, j] ≥ 0 ∀ (K, j) ∈ H (13)

where z[K, j] is a binary (0 − 1) variable that takes value 1 if the hyperarc (K, j) is selected,
and 0 otherwise, and c[K, j] represents the cost of selecting the hyperarc (K, j). The objective
function (10) minimizes the sum of costs of choosing a particular path, while constraints (11)
ensure that exactly one decision will terminate at the global state m. Constraints (12) enforce
flow balance conditions. Constraints (11) and (12) with the nonnegativity constraints (13)
are shown to be sufficient to produce a binary solution to the primal linear program LP2.
LP2 can be viewed as the dual to the computations of the dynamic program itself. Thus
we provide the dual of the problem above as follows:
Dual2:
max ω (14)

X
subject to: ω− λ[i] ≤ c[K, m], ∀ (K, m) ∈ H (15)
i∈K
X
λ[j] − λ[i] ≤ c[K, j], ∀ (K, j) ∈ H; j 6= m. (16)
i∈K

Here ω denotes the dual multiplier for constraint (11), and λ[j] is the dual variable for row
σ(j̄) of (12). Note that Dual2 is equivalent to the dynamic programming formulation.

9
4 Conclusions and Future Remarks

In this paper we survey linear programming approaches for solving dynamic programming
formulations of hard optimization problems. In particular, we analyze techniques used for
casting a dynamic program as a linear program. An LP with few variables and a large
number of constraints is often tractable via large scale linear programming algorithms such as
constraint generation. However for the problems where DP requires an exponential number
of states, the corresponding LP formulation consists of an exponential number of constraints.
This difficulty necessitates research on new techniques to reduce the state space of DP by
eliminating the states and decisions that will not lead to the optimal policy.
As each problem requires a specific DP formulation and a program to be solved, LP
provides an advantage of easily writing and solving tractable DP models using commercial
software. Furthermore LP enables us to use sensitivity analysis for dynamic programs. LP
sensitivity analysis and its relation to the states and decisions of the dynamic program could
be further investigated. Another direction for research is utilization of LP formulation of
DP in order to obtain stronger IP formulations in discrete optimization.

References

[1] R. Bellman. A Markovian decision process. Journal of Mathematics and Mechanics,


6:679–684, 1957.

[2] R. Bellman. On a routing problem. Quarterly of Applied Mathematics, 16:87–90, 1958.

[3] D. P. Bertsekas. Dynamic Programming and Optimal Control: Volume 1. Athena


Scientific, 2005.

[4] D. P. Bertsekas. Dynamic Programming and Optimal Control: Volume 2. Athena


Scientific, 2007.

10
[5] G. B. Dantzig. Linear Programming and Extensions. Princeton University Press, Prince-
ton, N. J., 1963.

[6] D. P. de Farias. The Linear Programming Approach to Approximate Dynamic Program-


ming: Theory and Application. PhD thesis, Stanford University, 2002.

[7] D. P. de Farias and B. Van Roy. The linear programming approach to approximate
dynamic programming. Operations Research, 51(6):850–865, 2003.

[8] D. P. de Farias and B. Van Roy. On constraint sampling in the linear programming
approach to approximate dynamic programming. Mathematics of Operations Research,
29(3):462–478, 2004.

[9] E. V Denardo. On linear programming in a Markov decision problem. Management


Science, 16(5):282–288, 1970.

[10] E. V Denardo. Dynamic Programming. Prentice-Hall, 2 edition, 2003.

[11] F. D’epenoux. A probabilistic production and inventory problem. Management Science,


10:98–108, 1963. Translation of an article published in Revue Frangaise de Recherche
Operationnelle 14, 1960.

[12] V. V. Desai, V. F. Farias, and C. C. Moallemi. Aproximate dynamic programming via


a smoothed approximate linear program. Working paper, 2009.

[13] S. E. Elmaghraby. The concept of state in discrete dynamic programming. Journal of


Mathematical Analysis and Applications, 29(3):523–557, 1970.

[14] G. D. Eppen and R. K. Martin. Solving multi-item capacitated lot sizing problems
using variable redefinition. Operations Research, 35(6):832–848, 1987.

[15] J. C. Hartman, İ. E. Büyüktahtakın, and J. C Smith. Dynamic programming based


inequalities for the capacitated lot-sizing problem. To appear in IIE Transactions, 2010.

11
[16] A. Hordijk and L. C. M. Kallenberg. Linear programming and Markov decision chains.
Management Science, 25(4):352–362, 1979.

[17] R. A. Howard. Dynamic Programming and Markov Processes. M.I.T. Press, Cambridge,
Massachusetts, 1960.

[18] T. Ibaraki. Solvable classes of discrete dynamic programming. Journal of Mathematical


Analysis and Applications, 43:642–693, 1973.

[19] R. M. Karp and M. Held. Finite state processes and dynamic programming. SIAM
Journal of Applied Mathematics, 15:693–718, 1967.

[20] E. L. Lawler and D. E. Wood. Branch-and-bound methods: A survey. Operations


Research, 14(4):699–719, 1966.

[21] A. S Manne. Linear programming and sequential decisions. Operations Research,


6(3):259–267, 1960.

[22] R. K. Martin, R. L. Rardin, and B. A. Campbell. Polyhedral characterization of discrete


dynamic programming. Operations Research, 38(1):127–138, 1990.

[23] T. L. Morin. Monotonicity and the principle of optimality. Journal of Mathematical


Analysis and Applications, 86:665–674, 1982.

[24] Th. L. Morin and R. E. Marsten. Branch-and-bound strategies for dynamic program-
ming. Operations Research, 24(4):611–627, 1976.

[25] G. Nemhauser. Introduction to Dynamic Programming. John Wiley and Sons, 1966.

[26] S. Osaki and H. Mine. Linear programming algorithms for semi-Markovian decision
processes. Journal of Mathematical Analysis and Applications, 22:356–381, 1968.

[27] P. Schweitzer and A. Seidmann. Generalized polynomial approximations in Markovian


decision processes. Journal of Mathematical Analysis and Applications, 110(2):568–582,
1985.

12
[28] J. F. Shapiro. Dynamic programming algorithms for the integer programming problem-
I: The integer programming problem viewed as a knapsack type problem. Operations
Research, 16(1):103–121, 1968.

[29] M. A Trick and S. E. Zin. A linear programming approach to solving stochastic dynamic
programs. unpublished, 1993.

[30] M. Valenti, B. Bethke, J. How, D. P. de Farias, and J. Vian. Embedding health man-
agement into mission tasking for UAV teams. In American Controls Conference, New
York, NY, 2007.

[31] L. A. Wolsey. Generalized dynamic programming methods in integer programming.


Mathematical Programming, 4(1):222–232, 1973.

13

View publication stats

Common questions

Powered by AI

In dynamic programming, value iteration generates a sequence of value vectors that approximate the optimal cost-to-go function. This iteration process can be reformulated as a linear program (LP), where the aim is to find the optimal cost vector. In this LP formulation, constraints ensure that the value at each state is bounded above by the immediate reward plus the expected future reward. The LP seeks to maximize an objective function subject to these constraints, effectively capturing the essence of value iteration in a linear format that can be efficiently solved using LP solvers .

Directed hypergraphs extend the capabilities of traditional acyclic graph models by allowing for the representation of complex dependencies among states. Unlike acyclic graphs, hypergraphs can encapsulate decisions involving multiple partial solution elements, enabling more intricate combinatorial optimization problems to be modeled. In a hypergraph, hyperarcs connect sets of states to a single state, effectively capturing the joint influence of multiple decision paths. This complex structure is suited for scenarios where decisions depend on a composite of prior state information, providing a richer framework for dynamic program formulation and expanding the scope of LP formulations in addressing these problems .

Duality in linear programming provides insights into the equivalence of LP formulations of dynamic programming and policy iteration methods. The dual of a linear program corresponding to a DP problem equates to finding the optimal policy iteratively. Denardo established that feasible bases of the dual are in one-to-one correspondence with policies in the dynamic programming process. Additionally, the simplex method applied to the dual LP performs the same sequence of pivot operations as does policy iteration, effectively representing a block-pivoting strategy where each policy iteration correlates to a pivot in the dual simplex method .

Policy iteration and value iteration differ primarily in their operational processes within dynamic programming. Value iteration involves iteratively updating value functions until convergence, while policy iteration alternates between evaluating and improving policies until the optimal policy is achieved. Linear programming utilizes these iterative methods by leveraging the duality of LP formulations to simulate policy iteration, where the feasible bases of the dual LP align with the policy evaluation and improvement cycles. The simplex method applied to the dual LP mimics policy iteration steps, thus allowing linear programming to effectively handle the implementation of both iteration methods for solving dynamic programming problems .

One major challenge in applying linear programming to stochastic dynamic programming models is the curse of dimensionality, where the state and decision spaces grow exponentially with problem size. This results in an overwhelming number of variables and constraints that are computationally expensive to handle. Mitigation strategies include using approximate dynamic programming techniques, constraint sampling methods, and basis reduction approaches such as approximate linear programming (ALP), where a subset of constraints is sampled, and the basis functions are strategically chosen to represent the broader constraint set effectively. These methods help reduce computational complexity while maintaining accuracy in solutions .

Schweitzer and Seidmann's work is pivotal in addressing the curse of dimensionality, which plagues dynamic programming with exponential growth in state space as problem size increases. They introduced the use of generalized polynomial approximations to simplify the cost-to-go function, significantly reducing the number of states and simplifying computations. This approximation method paved the way for approximate dynamic programming, offering a framework that balances computational feasibility with the accuracy of solutions. Their innovative approach helped extend the applicability of dynamic programming to larger, more complex problems that were previously computationally impractical .

Dynamic programming combined with linear programming enhances Markov decision processes (MDP) by providing a structured approach to solve for optimal policies efficiently. Dynamic programming allows for solving sequential decision problems by breaking them into manageable subproblems. Linear programming aids this process by offering a framework to express these problems using linear constraints and optimizing them using well-established LP solvers. This combination allows for addressing large-scale MDPs and improves solution accuracy by providing a clear polyhedral structure of potential solutions, facilitating the derivation of optimal or near-optimal policies .

Discrete dynamic programming problems can be represented using directed acyclic graphs (DAGs) where vertices signify states and arcs signify decisions transitioning one state to another. This representation is useful for visualizing and solving shortest path problems. In this structure, states can be assigned to stages, with the primary objective being to find the minimum-cost path from an initial to a target state. This problem structure translates directly into a linear program where variables represent the arcs, and constraints ensure flow conservation at each vertex. This LP format gives a polyhedral view of the DP problem, linking incidence vectors of decision paths directly to LP solutions .

ADP and ALP address the curse of dimensionality in dynamic programming differently. ADP approximates the cost-to-go function using a pre-specified set of basis functions, with the aim of reducing computational complexity by alleviating the explosion in state space size. On the other hand, ALP instead approximates the solution using a linear combination of basis functions and samples only a subset of the constraints, further reducing dimensionality during computation. This approach provides strong approximation guarantees when a suitable Lyapunov-like function is included in the basis .

Linear programming (LP) enhances the efficiency of dynamic programming (DP) by providing an alternative to solving functional equations associated with DP. By formulating dynamic programs as linear programs, LP takes advantage of its polynomial-time solvability for certain problem classes and its ability to efficiently handle large, complex problem structures. Furthermore, LP characterizes the polyhedral structure of discrete optimization problems, which allows for refined solution approaches that might be infeasible with traditional DP methods due to the curse of dimensionality .

You might also like