OverflowError as I try to use the value-iteration algorithm with mdptoolbox
I set up a simple MDP for a board that has 4 possible states and 4 possible actions. The board and reward setup looks as follows:
Here S4
is the goal state and S2
is the absorbing state. I have defined the transition probability matrices and reward matrice in the code that I wrote to get the optimal value function for this MDP. But as I run the code, I get an error that says: OverflowError: cannot convert float infinity to integer
. I could not understand the reason for this.
import mdptoolbox
import numpy as np
transitions = np.array([
# action 1 (Right)
[
[0.1, 0.7, 0.1, 0.1],
[0.3, 0.3, 0.3, 0.1],
[0.1, 0.2, 0.2, 0.5],
[0.1, 0.1, 0.1, 0.7]
],
# action 2 (Down)
[
[0.1, 0.4, 0.4, 0.1],
[0.3, 0.3, 0.3, 0.1],
[0.4, 0.1, 0.4, 0.1],
[0.1, 0.1, 0.1, 0.7]
],
# action 3 (Left)
[
[0.4, 0.3, 0.2, 0.1],
[0.2, 0.2, 0.4, 0.2],
[0.5, 0.1, 0.3, 0.1],
[0.1, 0.1, 0.1, 0.7]
],
# action 4 (Top)
[
[0.1, 0.4, 0.4, 0.1],
[0.3, 0.3, 0.3, 0.1],
[0.4, 0.1, 0.4, 0.1],
[0.1, 0.1, 0.1, 0.7]
]
])
rewards = np.array([
[-1, -100, -1, 1],
[-1, -100, -1, 1],
[-1, -100, -1, 1],
[1, 1, 1, 1]
])
vi = mdptoolbox.mdp.ValueIteration(transitions, rewards, discount=0.5)
vi.setVerbose()
vi.run()
print("Value function:")
print(vi.V)
print("Policy function")
print(vi.policy)
If I change the value of discount
to 1
from 0.5
, it works fine. What could be the reason for the value iteration not working with discount value 0.5
or any other decimal values?
Update: It looks like there is some issue with my reward matrix. I have not able to write it as I intended it to be. Because if I change some values in the reward matrix, the error disappears.
python dynamic-programming markov-chains stochastic mdptoolbox
add a comment |
I set up a simple MDP for a board that has 4 possible states and 4 possible actions. The board and reward setup looks as follows:
Here S4
is the goal state and S2
is the absorbing state. I have defined the transition probability matrices and reward matrice in the code that I wrote to get the optimal value function for this MDP. But as I run the code, I get an error that says: OverflowError: cannot convert float infinity to integer
. I could not understand the reason for this.
import mdptoolbox
import numpy as np
transitions = np.array([
# action 1 (Right)
[
[0.1, 0.7, 0.1, 0.1],
[0.3, 0.3, 0.3, 0.1],
[0.1, 0.2, 0.2, 0.5],
[0.1, 0.1, 0.1, 0.7]
],
# action 2 (Down)
[
[0.1, 0.4, 0.4, 0.1],
[0.3, 0.3, 0.3, 0.1],
[0.4, 0.1, 0.4, 0.1],
[0.1, 0.1, 0.1, 0.7]
],
# action 3 (Left)
[
[0.4, 0.3, 0.2, 0.1],
[0.2, 0.2, 0.4, 0.2],
[0.5, 0.1, 0.3, 0.1],
[0.1, 0.1, 0.1, 0.7]
],
# action 4 (Top)
[
[0.1, 0.4, 0.4, 0.1],
[0.3, 0.3, 0.3, 0.1],
[0.4, 0.1, 0.4, 0.1],
[0.1, 0.1, 0.1, 0.7]
]
])
rewards = np.array([
[-1, -100, -1, 1],
[-1, -100, -1, 1],
[-1, -100, -1, 1],
[1, 1, 1, 1]
])
vi = mdptoolbox.mdp.ValueIteration(transitions, rewards, discount=0.5)
vi.setVerbose()
vi.run()
print("Value function:")
print(vi.V)
print("Policy function")
print(vi.policy)
If I change the value of discount
to 1
from 0.5
, it works fine. What could be the reason for the value iteration not working with discount value 0.5
or any other decimal values?
Update: It looks like there is some issue with my reward matrix. I have not able to write it as I intended it to be. Because if I change some values in the reward matrix, the error disappears.
python dynamic-programming markov-chains stochastic mdptoolbox
add a comment |
I set up a simple MDP for a board that has 4 possible states and 4 possible actions. The board and reward setup looks as follows:
Here S4
is the goal state and S2
is the absorbing state. I have defined the transition probability matrices and reward matrice in the code that I wrote to get the optimal value function for this MDP. But as I run the code, I get an error that says: OverflowError: cannot convert float infinity to integer
. I could not understand the reason for this.
import mdptoolbox
import numpy as np
transitions = np.array([
# action 1 (Right)
[
[0.1, 0.7, 0.1, 0.1],
[0.3, 0.3, 0.3, 0.1],
[0.1, 0.2, 0.2, 0.5],
[0.1, 0.1, 0.1, 0.7]
],
# action 2 (Down)
[
[0.1, 0.4, 0.4, 0.1],
[0.3, 0.3, 0.3, 0.1],
[0.4, 0.1, 0.4, 0.1],
[0.1, 0.1, 0.1, 0.7]
],
# action 3 (Left)
[
[0.4, 0.3, 0.2, 0.1],
[0.2, 0.2, 0.4, 0.2],
[0.5, 0.1, 0.3, 0.1],
[0.1, 0.1, 0.1, 0.7]
],
# action 4 (Top)
[
[0.1, 0.4, 0.4, 0.1],
[0.3, 0.3, 0.3, 0.1],
[0.4, 0.1, 0.4, 0.1],
[0.1, 0.1, 0.1, 0.7]
]
])
rewards = np.array([
[-1, -100, -1, 1],
[-1, -100, -1, 1],
[-1, -100, -1, 1],
[1, 1, 1, 1]
])
vi = mdptoolbox.mdp.ValueIteration(transitions, rewards, discount=0.5)
vi.setVerbose()
vi.run()
print("Value function:")
print(vi.V)
print("Policy function")
print(vi.policy)
If I change the value of discount
to 1
from 0.5
, it works fine. What could be the reason for the value iteration not working with discount value 0.5
or any other decimal values?
Update: It looks like there is some issue with my reward matrix. I have not able to write it as I intended it to be. Because if I change some values in the reward matrix, the error disappears.
python dynamic-programming markov-chains stochastic mdptoolbox
I set up a simple MDP for a board that has 4 possible states and 4 possible actions. The board and reward setup looks as follows:
Here S4
is the goal state and S2
is the absorbing state. I have defined the transition probability matrices and reward matrice in the code that I wrote to get the optimal value function for this MDP. But as I run the code, I get an error that says: OverflowError: cannot convert float infinity to integer
. I could not understand the reason for this.
import mdptoolbox
import numpy as np
transitions = np.array([
# action 1 (Right)
[
[0.1, 0.7, 0.1, 0.1],
[0.3, 0.3, 0.3, 0.1],
[0.1, 0.2, 0.2, 0.5],
[0.1, 0.1, 0.1, 0.7]
],
# action 2 (Down)
[
[0.1, 0.4, 0.4, 0.1],
[0.3, 0.3, 0.3, 0.1],
[0.4, 0.1, 0.4, 0.1],
[0.1, 0.1, 0.1, 0.7]
],
# action 3 (Left)
[
[0.4, 0.3, 0.2, 0.1],
[0.2, 0.2, 0.4, 0.2],
[0.5, 0.1, 0.3, 0.1],
[0.1, 0.1, 0.1, 0.7]
],
# action 4 (Top)
[
[0.1, 0.4, 0.4, 0.1],
[0.3, 0.3, 0.3, 0.1],
[0.4, 0.1, 0.4, 0.1],
[0.1, 0.1, 0.1, 0.7]
]
])
rewards = np.array([
[-1, -100, -1, 1],
[-1, -100, -1, 1],
[-1, -100, -1, 1],
[1, 1, 1, 1]
])
vi = mdptoolbox.mdp.ValueIteration(transitions, rewards, discount=0.5)
vi.setVerbose()
vi.run()
print("Value function:")
print(vi.V)
print("Policy function")
print(vi.policy)
If I change the value of discount
to 1
from 0.5
, it works fine. What could be the reason for the value iteration not working with discount value 0.5
or any other decimal values?
Update: It looks like there is some issue with my reward matrix. I have not able to write it as I intended it to be. Because if I change some values in the reward matrix, the error disappears.
python dynamic-programming markov-chains stochastic mdptoolbox
python dynamic-programming markov-chains stochastic mdptoolbox
edited Nov 22 '18 at 6:37
Suhail Gupta
asked Nov 21 '18 at 11:56
Suhail GuptaSuhail Gupta
9,60850141247
9,60850141247
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
So it came out that the reward matrix I had defined was incorrect. According to the reward matrix as defined in the picture above, it should be of type (S,A)
as given in the documentation, where each row corresponds to a state starting from S1
until S4
and each column corresponds to action starting from A1
until A4
. The new reward matrice looks as follows:
#(S,A)
rewards = np.array([
[-1, -1, -1, -1],
[-100, -100, -100, -100],
[-1, -1, -1, -1],
[1, 1, 1, 1]
])
It works fine with this. But I am still not sure, what was happening inside that led to the overflow error.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53411530%2foverflowerror-as-i-try-to-use-the-value-iteration-algorithm-with-mdptoolbox%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
So it came out that the reward matrix I had defined was incorrect. According to the reward matrix as defined in the picture above, it should be of type (S,A)
as given in the documentation, where each row corresponds to a state starting from S1
until S4
and each column corresponds to action starting from A1
until A4
. The new reward matrice looks as follows:
#(S,A)
rewards = np.array([
[-1, -1, -1, -1],
[-100, -100, -100, -100],
[-1, -1, -1, -1],
[1, 1, 1, 1]
])
It works fine with this. But I am still not sure, what was happening inside that led to the overflow error.
add a comment |
So it came out that the reward matrix I had defined was incorrect. According to the reward matrix as defined in the picture above, it should be of type (S,A)
as given in the documentation, where each row corresponds to a state starting from S1
until S4
and each column corresponds to action starting from A1
until A4
. The new reward matrice looks as follows:
#(S,A)
rewards = np.array([
[-1, -1, -1, -1],
[-100, -100, -100, -100],
[-1, -1, -1, -1],
[1, 1, 1, 1]
])
It works fine with this. But I am still not sure, what was happening inside that led to the overflow error.
add a comment |
So it came out that the reward matrix I had defined was incorrect. According to the reward matrix as defined in the picture above, it should be of type (S,A)
as given in the documentation, where each row corresponds to a state starting from S1
until S4
and each column corresponds to action starting from A1
until A4
. The new reward matrice looks as follows:
#(S,A)
rewards = np.array([
[-1, -1, -1, -1],
[-100, -100, -100, -100],
[-1, -1, -1, -1],
[1, 1, 1, 1]
])
It works fine with this. But I am still not sure, what was happening inside that led to the overflow error.
So it came out that the reward matrix I had defined was incorrect. According to the reward matrix as defined in the picture above, it should be of type (S,A)
as given in the documentation, where each row corresponds to a state starting from S1
until S4
and each column corresponds to action starting from A1
until A4
. The new reward matrice looks as follows:
#(S,A)
rewards = np.array([
[-1, -1, -1, -1],
[-100, -100, -100, -100],
[-1, -1, -1, -1],
[1, 1, 1, 1]
])
It works fine with this. But I am still not sure, what was happening inside that led to the overflow error.
answered Nov 22 '18 at 9:57
Suhail GuptaSuhail Gupta
9,60850141247
9,60850141247
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53411530%2foverflowerror-as-i-try-to-use-the-value-iteration-algorithm-with-mdptoolbox%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown