Wasserstein GAN critic training ambiguity
I'm running a DCGAN-based GAN, and am experimenting with WGANs, but am a bit confused about how to train the WGAN.
In the official Wasserstein GAN PyTorch implementation, the discriminator/critic is said to be trained Diters
(usually 5) times per each generator training.
Does this mean that the critic/discriminator trains on Diters
batches or the whole dataset Diters
times? If I'm not mistaken, the official implementation suggests the discriminator/critic is trained on the whole dataset Diters
times, but other implementations of WGAN (in PyTorch and TensorFlow etc.) do the opposite.
Which is correct? The WGAN paper (to me, at least), indicates that it is Diters
batches. Training on the whole dataset is obviously orders of magnitude slower.
Thanks in advance!
python-3.x tensorflow machine-learning deep-learning pytorch
add a comment |
I'm running a DCGAN-based GAN, and am experimenting with WGANs, but am a bit confused about how to train the WGAN.
In the official Wasserstein GAN PyTorch implementation, the discriminator/critic is said to be trained Diters
(usually 5) times per each generator training.
Does this mean that the critic/discriminator trains on Diters
batches or the whole dataset Diters
times? If I'm not mistaken, the official implementation suggests the discriminator/critic is trained on the whole dataset Diters
times, but other implementations of WGAN (in PyTorch and TensorFlow etc.) do the opposite.
Which is correct? The WGAN paper (to me, at least), indicates that it is Diters
batches. Training on the whole dataset is obviously orders of magnitude slower.
Thanks in advance!
python-3.x tensorflow machine-learning deep-learning pytorch
add a comment |
I'm running a DCGAN-based GAN, and am experimenting with WGANs, but am a bit confused about how to train the WGAN.
In the official Wasserstein GAN PyTorch implementation, the discriminator/critic is said to be trained Diters
(usually 5) times per each generator training.
Does this mean that the critic/discriminator trains on Diters
batches or the whole dataset Diters
times? If I'm not mistaken, the official implementation suggests the discriminator/critic is trained on the whole dataset Diters
times, but other implementations of WGAN (in PyTorch and TensorFlow etc.) do the opposite.
Which is correct? The WGAN paper (to me, at least), indicates that it is Diters
batches. Training on the whole dataset is obviously orders of magnitude slower.
Thanks in advance!
python-3.x tensorflow machine-learning deep-learning pytorch
I'm running a DCGAN-based GAN, and am experimenting with WGANs, but am a bit confused about how to train the WGAN.
In the official Wasserstein GAN PyTorch implementation, the discriminator/critic is said to be trained Diters
(usually 5) times per each generator training.
Does this mean that the critic/discriminator trains on Diters
batches or the whole dataset Diters
times? If I'm not mistaken, the official implementation suggests the discriminator/critic is trained on the whole dataset Diters
times, but other implementations of WGAN (in PyTorch and TensorFlow etc.) do the opposite.
Which is correct? The WGAN paper (to me, at least), indicates that it is Diters
batches. Training on the whole dataset is obviously orders of magnitude slower.
Thanks in advance!
python-3.x tensorflow machine-learning deep-learning pytorch
python-3.x tensorflow machine-learning deep-learning pytorch
asked Nov 20 '18 at 20:58
krustybekkrustybek
425
425
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
The correct is to consider an iteration as a batch.
In the original paper, for each iteration of the critic/discriminator they are sampling a batch of size m
of the real data and a batch of size m
of prior samples p(z)
to work it. After the critic is trained over Diters
iterations, they train the generator which also starts by the sampling of a batch of prior samples of p(z)
.
Therefore, each iteration is working on a batch.
In the official implementation this is also happening. What may be confusing is that they use the variable name niter
to represent the number of epochs to train the model. Although they use a different scheme to set Diters
at lines 162-166:
# train the discriminator Diters times
if gen_iterations < 25 or gen_iterations % 500 == 0:
Diters = 100
else:
Diters = opt.Diters
they are, as in the paper, training the critic over Diters
batches.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53401431%2fwasserstein-gan-critic-training-ambiguity%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The correct is to consider an iteration as a batch.
In the original paper, for each iteration of the critic/discriminator they are sampling a batch of size m
of the real data and a batch of size m
of prior samples p(z)
to work it. After the critic is trained over Diters
iterations, they train the generator which also starts by the sampling of a batch of prior samples of p(z)
.
Therefore, each iteration is working on a batch.
In the official implementation this is also happening. What may be confusing is that they use the variable name niter
to represent the number of epochs to train the model. Although they use a different scheme to set Diters
at lines 162-166:
# train the discriminator Diters times
if gen_iterations < 25 or gen_iterations % 500 == 0:
Diters = 100
else:
Diters = opt.Diters
they are, as in the paper, training the critic over Diters
batches.
add a comment |
The correct is to consider an iteration as a batch.
In the original paper, for each iteration of the critic/discriminator they are sampling a batch of size m
of the real data and a batch of size m
of prior samples p(z)
to work it. After the critic is trained over Diters
iterations, they train the generator which also starts by the sampling of a batch of prior samples of p(z)
.
Therefore, each iteration is working on a batch.
In the official implementation this is also happening. What may be confusing is that they use the variable name niter
to represent the number of epochs to train the model. Although they use a different scheme to set Diters
at lines 162-166:
# train the discriminator Diters times
if gen_iterations < 25 or gen_iterations % 500 == 0:
Diters = 100
else:
Diters = opt.Diters
they are, as in the paper, training the critic over Diters
batches.
add a comment |
The correct is to consider an iteration as a batch.
In the original paper, for each iteration of the critic/discriminator they are sampling a batch of size m
of the real data and a batch of size m
of prior samples p(z)
to work it. After the critic is trained over Diters
iterations, they train the generator which also starts by the sampling of a batch of prior samples of p(z)
.
Therefore, each iteration is working on a batch.
In the official implementation this is also happening. What may be confusing is that they use the variable name niter
to represent the number of epochs to train the model. Although they use a different scheme to set Diters
at lines 162-166:
# train the discriminator Diters times
if gen_iterations < 25 or gen_iterations % 500 == 0:
Diters = 100
else:
Diters = opt.Diters
they are, as in the paper, training the critic over Diters
batches.
The correct is to consider an iteration as a batch.
In the original paper, for each iteration of the critic/discriminator they are sampling a batch of size m
of the real data and a batch of size m
of prior samples p(z)
to work it. After the critic is trained over Diters
iterations, they train the generator which also starts by the sampling of a batch of prior samples of p(z)
.
Therefore, each iteration is working on a batch.
In the official implementation this is also happening. What may be confusing is that they use the variable name niter
to represent the number of epochs to train the model. Although they use a different scheme to set Diters
at lines 162-166:
# train the discriminator Diters times
if gen_iterations < 25 or gen_iterations % 500 == 0:
Diters = 100
else:
Diters = opt.Diters
they are, as in the paper, training the critic over Diters
batches.
answered Nov 20 '18 at 23:36
K. BogdanK. Bogdan
1413
1413
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53401431%2fwasserstein-gan-critic-training-ambiguity%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown