OverflowError as I try to use the value-iteration algorithm with mdptoolbox

I set up a simple MDP for a board that has 4 possible states and 4 possible actions. The board and reward setup looks as follows:

enter image description here

Here S4 is the goal state and S2 is the absorbing state. I have defined the transition probability matrices and reward matrice in the code that I wrote to get the optimal value function for this MDP. But as I run the code, I get an error that says: OverflowError: cannot convert float infinity to integer. I could not understand the reason for this.

import mdptoolbox

import numpy as np



transitions = np.array([

    # action 1 (Right)

    [

        [0.1, 0.7, 0.1, 0.1],

        [0.3, 0.3, 0.3, 0.1],

        [0.1, 0.2, 0.2, 0.5],

        [0.1,  0.1,  0.1,  0.7]

    ],

    # action 2 (Down)

    [

        [0.1, 0.4, 0.4, 0.1],

        [0.3, 0.3, 0.3, 0.1],

        [0.4, 0.1, 0.4, 0.1],

        [0.1,  0.1,  0.1,  0.7]

    ],

    # action 3 (Left)

    [

        [0.4, 0.3, 0.2, 0.1],

        [0.2, 0.2, 0.4, 0.2],

        [0.5, 0.1, 0.3, 0.1],

        [0.1,  0.1,  0.1,  0.7]

    ],

    # action 4 (Top)

    [

        [0.1, 0.4, 0.4, 0.1],

        [0.3, 0.3, 0.3, 0.1],

        [0.4, 0.1, 0.4, 0.1],

        [0.1,  0.1,  0.1,  0.7]

    ]

])



rewards = np.array([

    [-1, -100, -1, 1],

    [-1, -100, -1, 1],

    [-1, -100, -1, 1],

    [1, 1, 1, 1]

])





vi = mdptoolbox.mdp.ValueIteration(transitions, rewards, discount=0.5)

vi.setVerbose()

vi.run()



print("Value function:")

print(vi.V)





print("Policy function")

print(vi.policy)

If I change the value of discount to 1 from 0.5, it works fine. What could be the reason for the value iteration not working with discount value 0.5 or any other decimal values?

Update: It looks like there is some issue with my reward matrix. I have not able to write it as I intended it to be. Because if I change some values in the reward matrix, the error disappears.

edited Nov 22 '18 at 6:37

asked Nov 21 '18 at 11:56

Suhail Gupta

9,60850141247

add a comment |

I set up a simple MDP for a board that has 4 possible states and 4 possible actions. The board and reward setup looks as follows:

enter image description here

import mdptoolbox

import numpy as np



transitions = np.array([

    # action 1 (Right)

    [

        [0.1, 0.7, 0.1, 0.1],

        [0.3, 0.3, 0.3, 0.1],

        [0.1, 0.2, 0.2, 0.5],

        [0.1,  0.1,  0.1,  0.7]

    ],

    # action 2 (Down)

    [

        [0.1, 0.4, 0.4, 0.1],

        [0.3, 0.3, 0.3, 0.1],

        [0.4, 0.1, 0.4, 0.1],

        [0.1,  0.1,  0.1,  0.7]

    ],

    # action 3 (Left)

    [

        [0.4, 0.3, 0.2, 0.1],

        [0.2, 0.2, 0.4, 0.2],

        [0.5, 0.1, 0.3, 0.1],

        [0.1,  0.1,  0.1,  0.7]

    ],

    # action 4 (Top)

    [

        [0.1, 0.4, 0.4, 0.1],

        [0.3, 0.3, 0.3, 0.1],

        [0.4, 0.1, 0.4, 0.1],

        [0.1,  0.1,  0.1,  0.7]

    ]

])



rewards = np.array([

    [-1, -100, -1, 1],

    [-1, -100, -1, 1],

    [-1, -100, -1, 1],

    [1, 1, 1, 1]

])





vi = mdptoolbox.mdp.ValueIteration(transitions, rewards, discount=0.5)

vi.setVerbose()

vi.run()



print("Value function:")

print(vi.V)





print("Policy function")

print(vi.policy)

If I change the value of discount to 1 from 0.5, it works fine. What could be the reason for the value iteration not working with discount value 0.5 or any other decimal values?

Update: It looks like there is some issue with my reward matrix. I have not able to write it as I intended it to be. Because if I change some values in the reward matrix, the error disappears.

edited Nov 22 '18 at 6:37

asked Nov 21 '18 at 11:56

Suhail Gupta

9,60850141247

add a comment |

I set up a simple MDP for a board that has 4 possible states and 4 possible actions. The board and reward setup looks as follows:

enter image description here

import mdptoolbox

import numpy as np



transitions = np.array([

    # action 1 (Right)

    [

        [0.1, 0.7, 0.1, 0.1],

        [0.3, 0.3, 0.3, 0.1],

        [0.1, 0.2, 0.2, 0.5],

        [0.1,  0.1,  0.1,  0.7]

    ],

    # action 2 (Down)

    [

        [0.1, 0.4, 0.4, 0.1],

        [0.3, 0.3, 0.3, 0.1],

        [0.4, 0.1, 0.4, 0.1],

        [0.1,  0.1,  0.1,  0.7]

    ],

    # action 3 (Left)

    [

        [0.4, 0.3, 0.2, 0.1],

        [0.2, 0.2, 0.4, 0.2],

        [0.5, 0.1, 0.3, 0.1],

        [0.1,  0.1,  0.1,  0.7]

    ],

    # action 4 (Top)

    [

        [0.1, 0.4, 0.4, 0.1],

        [0.3, 0.3, 0.3, 0.1],

        [0.4, 0.1, 0.4, 0.1],

        [0.1,  0.1,  0.1,  0.7]

    ]

])



rewards = np.array([

    [-1, -100, -1, 1],

    [-1, -100, -1, 1],

    [-1, -100, -1, 1],

    [1, 1, 1, 1]

])





vi = mdptoolbox.mdp.ValueIteration(transitions, rewards, discount=0.5)

vi.setVerbose()

vi.run()



print("Value function:")

print(vi.V)





print("Policy function")

print(vi.policy)

If I change the value of discount to 1 from 0.5, it works fine. What could be the reason for the value iteration not working with discount value 0.5 or any other decimal values?

Update: It looks like there is some issue with my reward matrix. I have not able to write it as I intended it to be. Because if I change some values in the reward matrix, the error disappears.

edited Nov 22 '18 at 6:37

asked Nov 21 '18 at 11:56

Suhail Gupta

9,60850141247

I set up a simple MDP for a board that has 4 possible states and 4 possible actions. The board and reward setup looks as follows:

enter image description here

import mdptoolbox

import numpy as np



transitions = np.array([

    # action 1 (Right)

    [

        [0.1, 0.7, 0.1, 0.1],

        [0.3, 0.3, 0.3, 0.1],

        [0.1, 0.2, 0.2, 0.5],

        [0.1,  0.1,  0.1,  0.7]

    ],

    # action 2 (Down)

    [

        [0.1, 0.4, 0.4, 0.1],

        [0.3, 0.3, 0.3, 0.1],

        [0.4, 0.1, 0.4, 0.1],

        [0.1,  0.1,  0.1,  0.7]

    ],

    # action 3 (Left)

    [

        [0.4, 0.3, 0.2, 0.1],

        [0.2, 0.2, 0.4, 0.2],

        [0.5, 0.1, 0.3, 0.1],

        [0.1,  0.1,  0.1,  0.7]

    ],

    # action 4 (Top)

    [

        [0.1, 0.4, 0.4, 0.1],

        [0.3, 0.3, 0.3, 0.1],

        [0.4, 0.1, 0.4, 0.1],

        [0.1,  0.1,  0.1,  0.7]

    ]

])



rewards = np.array([

    [-1, -100, -1, 1],

    [-1, -100, -1, 1],

    [-1, -100, -1, 1],

    [1, 1, 1, 1]

])





vi = mdptoolbox.mdp.ValueIteration(transitions, rewards, discount=0.5)

vi.setVerbose()

vi.run()



print("Value function:")

print(vi.V)





print("Policy function")

print(vi.policy)

If I change the value of discount to 1 from 0.5, it works fine. What could be the reason for the value iteration not working with discount value 0.5 or any other decimal values?

Update: It looks like there is some issue with my reward matrix. I have not able to write it as I intended it to be. Because if I change some values in the reward matrix, the error disappears.

python dynamic-programming markov-chains stochastic mdptoolbox

edited Nov 22 '18 at 6:37

asked Nov 21 '18 at 11:56

Suhail Gupta

9,60850141247

edited Nov 22 '18 at 6:37

asked Nov 21 '18 at 11:56

Suhail Gupta

9,60850141247

edited Nov 22 '18 at 6:37

asked Nov 21 '18 at 11:56

Suhail Gupta

9,60850141247

asked Nov 21 '18 at 11:56

Suhail Gupta

9,60850141247

asked Nov 21 '18 at 11:56

Suhail Gupta

9,60850141247

add a comment |

1 Answer
1

active

oldest

votes

So it came out that the reward matrix I had defined was incorrect. According to the reward matrix as defined in the picture above, it should be of type (S,A) as given in the documentation, where each row corresponds to a state starting from S1 until S4 and each column corresponds to action starting from A1 until A4. The new reward matrice looks as follows:

#(S,A)

rewards = np.array([

    [-1, -1, -1, -1],

    [-100, -100, -100, -100],

    [-1, -1, -1, -1],

    [1, 1, 1, 1]

])

It works fine with this. But I am still not sure, what was happening inside that led to the overflow error.

answered Nov 22 '18 at 9:57

Suhail Gupta

9,60850141247

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53411530%2foverflowerror-as-i-try-to-use-the-value-iteration-algorithm-with-mdptoolbox%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

#(S,A)

rewards = np.array([

    [-1, -1, -1, -1],

    [-100, -100, -100, -100],

    [-1, -1, -1, -1],

    [1, 1, 1, 1]

])

It works fine with this. But I am still not sure, what was happening inside that led to the overflow error.

answered Nov 22 '18 at 9:57

Suhail Gupta

9,60850141247

add a comment |

#(S,A)

rewards = np.array([

    [-1, -1, -1, -1],

    [-100, -100, -100, -100],

    [-1, -1, -1, -1],

    [1, 1, 1, 1]

])

It works fine with this. But I am still not sure, what was happening inside that led to the overflow error.

answered Nov 22 '18 at 9:57

Suhail Gupta

9,60850141247

add a comment |

#(S,A)

rewards = np.array([

    [-1, -1, -1, -1],

    [-100, -100, -100, -100],

    [-1, -1, -1, -1],

    [1, 1, 1, 1]

])

It works fine with this. But I am still not sure, what was happening inside that led to the overflow error.

answered Nov 22 '18 at 9:57

Suhail Gupta

9,60850141247

#(S,A)

rewards = np.array([

    [-1, -1, -1, -1],

    [-100, -100, -100, -100],

    [-1, -1, -1, -1],

    [1, 1, 1, 1]

])

It works fine with this. But I am still not sure, what was happening inside that led to the overflow error.

answered Nov 22 '18 at 9:57

Suhail Gupta

9,60850141247

answered Nov 22 '18 at 9:57

Suhail Gupta

9,60850141247

answered Nov 22 '18 at 9:57

Suhail Gupta

9,60850141247

answered Nov 22 '18 at 9:57

Suhail Gupta

9,60850141247

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky