Bellman's Principle of Optimality
up vote
1
down vote
favorite
I'm currently reading Pham's Continuous-time Stochastic Control and Optimization with Financial Applications however I'm slightly confused with the way the Dynamic Programming Principle is presented.
In particular, the Theorem is stated in terms of an optimal control and stopping time. I'm familiar with the analysis done when just searching for an optimal control but not with the stopping time. From Pham
Theorem (Dynamic Programming Principle)
Let $xin mathbb{R}^n$ Then we have
begin{equation}
v(x)=sup_{alpha ;in mathcal{A}(x)}sup_{theta;in mathcal{T}}Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
end{equation}
begin{equation}
;;;;;;;=sup_{alpha ;in mathcal{A}(x)}inf_{theta;in mathcal{T}}Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
end{equation}
where $alpha$ is the control process and $mathcal{A}(x)$ is the set of admissible processes, $mathcal{T}$ is the set of stopping times, $X^x_s$ represents the stochastic process starting at $x$ at time $s$. The utility or reward function is $f$ and the value function $v$. By convention $e^{-betatheta}=0$ when $theta=infty$.
I'm unsure of the intuition of why these should be equal. I'm happy with the interpretation of the DPP. That is, the optimization problem can be split in two
parts: an optimal control on the whole time interval $[t, T]$ may be obtained by first searching for an optimal control from time $theta$ given the state value and then maximising over controls the value
begin{equation}
Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
end{equation}
What I'm unsure about is why the $sup_{theta in mathcal{T}}$ becomes $inf_{theta in mathcal{T}}$ in the equivalent definitions. When considering just the control problem I understand the intuition but not with the stopping times.
Any help would be greatly appreciated.
optimal-control
add a comment |
up vote
1
down vote
favorite
I'm currently reading Pham's Continuous-time Stochastic Control and Optimization with Financial Applications however I'm slightly confused with the way the Dynamic Programming Principle is presented.
In particular, the Theorem is stated in terms of an optimal control and stopping time. I'm familiar with the analysis done when just searching for an optimal control but not with the stopping time. From Pham
Theorem (Dynamic Programming Principle)
Let $xin mathbb{R}^n$ Then we have
begin{equation}
v(x)=sup_{alpha ;in mathcal{A}(x)}sup_{theta;in mathcal{T}}Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
end{equation}
begin{equation}
;;;;;;;=sup_{alpha ;in mathcal{A}(x)}inf_{theta;in mathcal{T}}Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
end{equation}
where $alpha$ is the control process and $mathcal{A}(x)$ is the set of admissible processes, $mathcal{T}$ is the set of stopping times, $X^x_s$ represents the stochastic process starting at $x$ at time $s$. The utility or reward function is $f$ and the value function $v$. By convention $e^{-betatheta}=0$ when $theta=infty$.
I'm unsure of the intuition of why these should be equal. I'm happy with the interpretation of the DPP. That is, the optimization problem can be split in two
parts: an optimal control on the whole time interval $[t, T]$ may be obtained by first searching for an optimal control from time $theta$ given the state value and then maximising over controls the value
begin{equation}
Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
end{equation}
What I'm unsure about is why the $sup_{theta in mathcal{T}}$ becomes $inf_{theta in mathcal{T}}$ in the equivalent definitions. When considering just the control problem I understand the intuition but not with the stopping times.
Any help would be greatly appreciated.
optimal-control
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I'm currently reading Pham's Continuous-time Stochastic Control and Optimization with Financial Applications however I'm slightly confused with the way the Dynamic Programming Principle is presented.
In particular, the Theorem is stated in terms of an optimal control and stopping time. I'm familiar with the analysis done when just searching for an optimal control but not with the stopping time. From Pham
Theorem (Dynamic Programming Principle)
Let $xin mathbb{R}^n$ Then we have
begin{equation}
v(x)=sup_{alpha ;in mathcal{A}(x)}sup_{theta;in mathcal{T}}Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
end{equation}
begin{equation}
;;;;;;;=sup_{alpha ;in mathcal{A}(x)}inf_{theta;in mathcal{T}}Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
end{equation}
where $alpha$ is the control process and $mathcal{A}(x)$ is the set of admissible processes, $mathcal{T}$ is the set of stopping times, $X^x_s$ represents the stochastic process starting at $x$ at time $s$. The utility or reward function is $f$ and the value function $v$. By convention $e^{-betatheta}=0$ when $theta=infty$.
I'm unsure of the intuition of why these should be equal. I'm happy with the interpretation of the DPP. That is, the optimization problem can be split in two
parts: an optimal control on the whole time interval $[t, T]$ may be obtained by first searching for an optimal control from time $theta$ given the state value and then maximising over controls the value
begin{equation}
Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
end{equation}
What I'm unsure about is why the $sup_{theta in mathcal{T}}$ becomes $inf_{theta in mathcal{T}}$ in the equivalent definitions. When considering just the control problem I understand the intuition but not with the stopping times.
Any help would be greatly appreciated.
optimal-control
I'm currently reading Pham's Continuous-time Stochastic Control and Optimization with Financial Applications however I'm slightly confused with the way the Dynamic Programming Principle is presented.
In particular, the Theorem is stated in terms of an optimal control and stopping time. I'm familiar with the analysis done when just searching for an optimal control but not with the stopping time. From Pham
Theorem (Dynamic Programming Principle)
Let $xin mathbb{R}^n$ Then we have
begin{equation}
v(x)=sup_{alpha ;in mathcal{A}(x)}sup_{theta;in mathcal{T}}Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
end{equation}
begin{equation}
;;;;;;;=sup_{alpha ;in mathcal{A}(x)}inf_{theta;in mathcal{T}}Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
end{equation}
where $alpha$ is the control process and $mathcal{A}(x)$ is the set of admissible processes, $mathcal{T}$ is the set of stopping times, $X^x_s$ represents the stochastic process starting at $x$ at time $s$. The utility or reward function is $f$ and the value function $v$. By convention $e^{-betatheta}=0$ when $theta=infty$.
I'm unsure of the intuition of why these should be equal. I'm happy with the interpretation of the DPP. That is, the optimization problem can be split in two
parts: an optimal control on the whole time interval $[t, T]$ may be obtained by first searching for an optimal control from time $theta$ given the state value and then maximising over controls the value
begin{equation}
Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
end{equation}
What I'm unsure about is why the $sup_{theta in mathcal{T}}$ becomes $inf_{theta in mathcal{T}}$ in the equivalent definitions. When considering just the control problem I understand the intuition but not with the stopping times.
Any help would be greatly appreciated.
optimal-control
optimal-control
asked Nov 13 at 23:48
mark
7481612
7481612
add a comment |
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2997536%2fbellmans-principle-of-optimality%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown