Bellman's Principle of Optimality











up vote
1
down vote

favorite












I'm currently reading Pham's Continuous-time Stochastic Control and Optimization with Financial Applications however I'm slightly confused with the way the Dynamic Programming Principle is presented.



In particular, the Theorem is stated in terms of an optimal control and stopping time. I'm familiar with the analysis done when just searching for an optimal control but not with the stopping time. From Pham



Theorem (Dynamic Programming Principle)



Let $xin mathbb{R}^n$ Then we have
begin{equation}
v(x)=sup_{alpha ;in mathcal{A}(x)}sup_{theta;in mathcal{T}}Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
end{equation}

begin{equation}
;;;;;;;=sup_{alpha ;in mathcal{A}(x)}inf_{theta;in mathcal{T}}Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
end{equation}



where $alpha$ is the control process and $mathcal{A}(x)$ is the set of admissible processes, $mathcal{T}$ is the set of stopping times, $X^x_s$ represents the stochastic process starting at $x$ at time $s$. The utility or reward function is $f$ and the value function $v$. By convention $e^{-betatheta}=0$ when $theta=infty$.



I'm unsure of the intuition of why these should be equal. I'm happy with the interpretation of the DPP. That is, the optimization problem can be split in two
parts: an optimal control on the whole time interval $[t, T]$ may be obtained by first searching for an optimal control from time $theta$ given the state value and then maximising over controls the value



begin{equation}
Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
end{equation}



What I'm unsure about is why the $sup_{theta in mathcal{T}}$ becomes $inf_{theta in mathcal{T}}$ in the equivalent definitions. When considering just the control problem I understand the intuition but not with the stopping times.



Any help would be greatly appreciated.










share|cite|improve this question


























    up vote
    1
    down vote

    favorite












    I'm currently reading Pham's Continuous-time Stochastic Control and Optimization with Financial Applications however I'm slightly confused with the way the Dynamic Programming Principle is presented.



    In particular, the Theorem is stated in terms of an optimal control and stopping time. I'm familiar with the analysis done when just searching for an optimal control but not with the stopping time. From Pham



    Theorem (Dynamic Programming Principle)



    Let $xin mathbb{R}^n$ Then we have
    begin{equation}
    v(x)=sup_{alpha ;in mathcal{A}(x)}sup_{theta;in mathcal{T}}Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
    end{equation}

    begin{equation}
    ;;;;;;;=sup_{alpha ;in mathcal{A}(x)}inf_{theta;in mathcal{T}}Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
    end{equation}



    where $alpha$ is the control process and $mathcal{A}(x)$ is the set of admissible processes, $mathcal{T}$ is the set of stopping times, $X^x_s$ represents the stochastic process starting at $x$ at time $s$. The utility or reward function is $f$ and the value function $v$. By convention $e^{-betatheta}=0$ when $theta=infty$.



    I'm unsure of the intuition of why these should be equal. I'm happy with the interpretation of the DPP. That is, the optimization problem can be split in two
    parts: an optimal control on the whole time interval $[t, T]$ may be obtained by first searching for an optimal control from time $theta$ given the state value and then maximising over controls the value



    begin{equation}
    Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
    end{equation}



    What I'm unsure about is why the $sup_{theta in mathcal{T}}$ becomes $inf_{theta in mathcal{T}}$ in the equivalent definitions. When considering just the control problem I understand the intuition but not with the stopping times.



    Any help would be greatly appreciated.










    share|cite|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I'm currently reading Pham's Continuous-time Stochastic Control and Optimization with Financial Applications however I'm slightly confused with the way the Dynamic Programming Principle is presented.



      In particular, the Theorem is stated in terms of an optimal control and stopping time. I'm familiar with the analysis done when just searching for an optimal control but not with the stopping time. From Pham



      Theorem (Dynamic Programming Principle)



      Let $xin mathbb{R}^n$ Then we have
      begin{equation}
      v(x)=sup_{alpha ;in mathcal{A}(x)}sup_{theta;in mathcal{T}}Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
      end{equation}

      begin{equation}
      ;;;;;;;=sup_{alpha ;in mathcal{A}(x)}inf_{theta;in mathcal{T}}Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
      end{equation}



      where $alpha$ is the control process and $mathcal{A}(x)$ is the set of admissible processes, $mathcal{T}$ is the set of stopping times, $X^x_s$ represents the stochastic process starting at $x$ at time $s$. The utility or reward function is $f$ and the value function $v$. By convention $e^{-betatheta}=0$ when $theta=infty$.



      I'm unsure of the intuition of why these should be equal. I'm happy with the interpretation of the DPP. That is, the optimization problem can be split in two
      parts: an optimal control on the whole time interval $[t, T]$ may be obtained by first searching for an optimal control from time $theta$ given the state value and then maximising over controls the value



      begin{equation}
      Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
      end{equation}



      What I'm unsure about is why the $sup_{theta in mathcal{T}}$ becomes $inf_{theta in mathcal{T}}$ in the equivalent definitions. When considering just the control problem I understand the intuition but not with the stopping times.



      Any help would be greatly appreciated.










      share|cite|improve this question













      I'm currently reading Pham's Continuous-time Stochastic Control and Optimization with Financial Applications however I'm slightly confused with the way the Dynamic Programming Principle is presented.



      In particular, the Theorem is stated in terms of an optimal control and stopping time. I'm familiar with the analysis done when just searching for an optimal control but not with the stopping time. From Pham



      Theorem (Dynamic Programming Principle)



      Let $xin mathbb{R}^n$ Then we have
      begin{equation}
      v(x)=sup_{alpha ;in mathcal{A}(x)}sup_{theta;in mathcal{T}}Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
      end{equation}

      begin{equation}
      ;;;;;;;=sup_{alpha ;in mathcal{A}(x)}inf_{theta;in mathcal{T}}Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
      end{equation}



      where $alpha$ is the control process and $mathcal{A}(x)$ is the set of admissible processes, $mathcal{T}$ is the set of stopping times, $X^x_s$ represents the stochastic process starting at $x$ at time $s$. The utility or reward function is $f$ and the value function $v$. By convention $e^{-betatheta}=0$ when $theta=infty$.



      I'm unsure of the intuition of why these should be equal. I'm happy with the interpretation of the DPP. That is, the optimization problem can be split in two
      parts: an optimal control on the whole time interval $[t, T]$ may be obtained by first searching for an optimal control from time $theta$ given the state value and then maximising over controls the value



      begin{equation}
      Ebigg[int_0^theta e^{-beta s}f(X_s^x, alpha_s)ds+e^{-beta theta}v(X^x_theta)bigg]
      end{equation}



      What I'm unsure about is why the $sup_{theta in mathcal{T}}$ becomes $inf_{theta in mathcal{T}}$ in the equivalent definitions. When considering just the control problem I understand the intuition but not with the stopping times.



      Any help would be greatly appreciated.







      optimal-control






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Nov 13 at 23:48









      mark

      7481612




      7481612



























          active

          oldest

          votes











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "69"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2997536%2fbellmans-principle-of-optimality%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown






























          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2997536%2fbellmans-principle-of-optimality%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

          ComboBox Display Member on multiple fields

          Is it possible to collect Nectar points via Trainline?