Wasserstein GAN critic training ambiguity












0















I'm running a DCGAN-based GAN, and am experimenting with WGANs, but am a bit confused about how to train the WGAN.



In the official Wasserstein GAN PyTorch implementation, the discriminator/critic is said to be trained Diters (usually 5) times per each generator training.



Does this mean that the critic/discriminator trains on Diters batches or the whole dataset Diters times? If I'm not mistaken, the official implementation suggests the discriminator/critic is trained on the whole dataset Diters times, but other implementations of WGAN (in PyTorch and TensorFlow etc.) do the opposite.



Which is correct? The WGAN paper (to me, at least), indicates that it is Diters batches. Training on the whole dataset is obviously orders of magnitude slower.



Thanks in advance!










share|improve this question



























    0















    I'm running a DCGAN-based GAN, and am experimenting with WGANs, but am a bit confused about how to train the WGAN.



    In the official Wasserstein GAN PyTorch implementation, the discriminator/critic is said to be trained Diters (usually 5) times per each generator training.



    Does this mean that the critic/discriminator trains on Diters batches or the whole dataset Diters times? If I'm not mistaken, the official implementation suggests the discriminator/critic is trained on the whole dataset Diters times, but other implementations of WGAN (in PyTorch and TensorFlow etc.) do the opposite.



    Which is correct? The WGAN paper (to me, at least), indicates that it is Diters batches. Training on the whole dataset is obviously orders of magnitude slower.



    Thanks in advance!










    share|improve this question

























      0












      0








      0








      I'm running a DCGAN-based GAN, and am experimenting with WGANs, but am a bit confused about how to train the WGAN.



      In the official Wasserstein GAN PyTorch implementation, the discriminator/critic is said to be trained Diters (usually 5) times per each generator training.



      Does this mean that the critic/discriminator trains on Diters batches or the whole dataset Diters times? If I'm not mistaken, the official implementation suggests the discriminator/critic is trained on the whole dataset Diters times, but other implementations of WGAN (in PyTorch and TensorFlow etc.) do the opposite.



      Which is correct? The WGAN paper (to me, at least), indicates that it is Diters batches. Training on the whole dataset is obviously orders of magnitude slower.



      Thanks in advance!










      share|improve this question














      I'm running a DCGAN-based GAN, and am experimenting with WGANs, but am a bit confused about how to train the WGAN.



      In the official Wasserstein GAN PyTorch implementation, the discriminator/critic is said to be trained Diters (usually 5) times per each generator training.



      Does this mean that the critic/discriminator trains on Diters batches or the whole dataset Diters times? If I'm not mistaken, the official implementation suggests the discriminator/critic is trained on the whole dataset Diters times, but other implementations of WGAN (in PyTorch and TensorFlow etc.) do the opposite.



      Which is correct? The WGAN paper (to me, at least), indicates that it is Diters batches. Training on the whole dataset is obviously orders of magnitude slower.



      Thanks in advance!







      python-3.x tensorflow machine-learning deep-learning pytorch






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 20 '18 at 20:58









      krustybekkrustybek

      425




      425
























          1 Answer
          1






          active

          oldest

          votes


















          0














          The correct is to consider an iteration as a batch.
          In the original paper, for each iteration of the critic/discriminator they are sampling a batch of size m of the real data and a batch of size m of prior samples p(z) to work it. After the critic is trained over Diters iterations, they train the generator which also starts by the sampling of a batch of prior samples of p(z).
          Therefore, each iteration is working on a batch.



          In the official implementation this is also happening. What may be confusing is that they use the variable name niter to represent the number of epochs to train the model. Although they use a different scheme to set Diters at lines 162-166:



          # train the discriminator Diters times
          if gen_iterations < 25 or gen_iterations % 500 == 0:
          Diters = 100
          else:
          Diters = opt.Diters


          they are, as in the paper, training the critic over Diters batches.






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53401431%2fwasserstein-gan-critic-training-ambiguity%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            The correct is to consider an iteration as a batch.
            In the original paper, for each iteration of the critic/discriminator they are sampling a batch of size m of the real data and a batch of size m of prior samples p(z) to work it. After the critic is trained over Diters iterations, they train the generator which also starts by the sampling of a batch of prior samples of p(z).
            Therefore, each iteration is working on a batch.



            In the official implementation this is also happening. What may be confusing is that they use the variable name niter to represent the number of epochs to train the model. Although they use a different scheme to set Diters at lines 162-166:



            # train the discriminator Diters times
            if gen_iterations < 25 or gen_iterations % 500 == 0:
            Diters = 100
            else:
            Diters = opt.Diters


            they are, as in the paper, training the critic over Diters batches.






            share|improve this answer




























              0














              The correct is to consider an iteration as a batch.
              In the original paper, for each iteration of the critic/discriminator they are sampling a batch of size m of the real data and a batch of size m of prior samples p(z) to work it. After the critic is trained over Diters iterations, they train the generator which also starts by the sampling of a batch of prior samples of p(z).
              Therefore, each iteration is working on a batch.



              In the official implementation this is also happening. What may be confusing is that they use the variable name niter to represent the number of epochs to train the model. Although they use a different scheme to set Diters at lines 162-166:



              # train the discriminator Diters times
              if gen_iterations < 25 or gen_iterations % 500 == 0:
              Diters = 100
              else:
              Diters = opt.Diters


              they are, as in the paper, training the critic over Diters batches.






              share|improve this answer


























                0












                0








                0







                The correct is to consider an iteration as a batch.
                In the original paper, for each iteration of the critic/discriminator they are sampling a batch of size m of the real data and a batch of size m of prior samples p(z) to work it. After the critic is trained over Diters iterations, they train the generator which also starts by the sampling of a batch of prior samples of p(z).
                Therefore, each iteration is working on a batch.



                In the official implementation this is also happening. What may be confusing is that they use the variable name niter to represent the number of epochs to train the model. Although they use a different scheme to set Diters at lines 162-166:



                # train the discriminator Diters times
                if gen_iterations < 25 or gen_iterations % 500 == 0:
                Diters = 100
                else:
                Diters = opt.Diters


                they are, as in the paper, training the critic over Diters batches.






                share|improve this answer













                The correct is to consider an iteration as a batch.
                In the original paper, for each iteration of the critic/discriminator they are sampling a batch of size m of the real data and a batch of size m of prior samples p(z) to work it. After the critic is trained over Diters iterations, they train the generator which also starts by the sampling of a batch of prior samples of p(z).
                Therefore, each iteration is working on a batch.



                In the official implementation this is also happening. What may be confusing is that they use the variable name niter to represent the number of epochs to train the model. Although they use a different scheme to set Diters at lines 162-166:



                # train the discriminator Diters times
                if gen_iterations < 25 or gen_iterations % 500 == 0:
                Diters = 100
                else:
                Diters = opt.Diters


                they are, as in the paper, training the critic over Diters batches.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 20 '18 at 23:36









                K. BogdanK. Bogdan

                1413




                1413
































                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53401431%2fwasserstein-gan-critic-training-ambiguity%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    How to change which sound is reproduced for terminal bell?

                    Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents

                    Can I use Tabulator js library in my java Spring + Thymeleaf project?