Multiprocessing for Tensorflow model does not return outputs and halts











up vote
0
down vote

favorite












I am using multiprocessing to train a Tensorflow model, splitting x_train and y_train in chunks of equal size to use 16 processors, then use an ensemble.



I am using a train function for Tensorflow:



def train(parte):
with tf.Session(graph=graph, config=config) as sess:
sess.run(init)
for step in range(1, training_epochs+1):
Xt, Yt = next_batch(batch_size, X0[parte], Y0[parte])
batch_x, batch_y = Xt,Yt
sess.run(train_op, feed_dict={X: batch_x, Y: batch_y, is_training: True})
if step % display_step == 0 or step == 1:
loss, acc = sess.run([loss_op, accuracy], feed_dict={
X: batch_x, Y: batch_y, is_training: False})
test_len = Y0_test[parte].shape[0]
test_data = X0_test[parte]
test_label = Y0_test[parte]
val_acc = sess.run(accuracy, feed_dict={X: test_data, Y: test_label, is_training: False})
if loss<0.4:
print("Step " + str(step) + ", Minibatch Loss= " +
"{:.4f}".format(loss) + ", Training Accuracy= " +
"{:.3f}".format(acc) + ", Test Accuracy= " +
"{:.3f}".format(val_acc))
pred00 = sess.run([prediction],feed_dict={X: X0_test[parte], is_training: False})
return loss


Then map each chunk of the train function to the pool:



if __name__ == '__main__':
pool = multiprocessing.Pool(16)
results = pool.map(train, np.arange(0,15))
pool.close()
pool.join()


The model is saving weights properly, however there are two problems: 1. the output on every 10 epochs is only being printed sometimes and 2. the pool process is not being closed, halting Jupyter cell being evaluated.



Is there a way I can make this work?










share|improve this question




























    up vote
    0
    down vote

    favorite












    I am using multiprocessing to train a Tensorflow model, splitting x_train and y_train in chunks of equal size to use 16 processors, then use an ensemble.



    I am using a train function for Tensorflow:



    def train(parte):
    with tf.Session(graph=graph, config=config) as sess:
    sess.run(init)
    for step in range(1, training_epochs+1):
    Xt, Yt = next_batch(batch_size, X0[parte], Y0[parte])
    batch_x, batch_y = Xt,Yt
    sess.run(train_op, feed_dict={X: batch_x, Y: batch_y, is_training: True})
    if step % display_step == 0 or step == 1:
    loss, acc = sess.run([loss_op, accuracy], feed_dict={
    X: batch_x, Y: batch_y, is_training: False})
    test_len = Y0_test[parte].shape[0]
    test_data = X0_test[parte]
    test_label = Y0_test[parte]
    val_acc = sess.run(accuracy, feed_dict={X: test_data, Y: test_label, is_training: False})
    if loss<0.4:
    print("Step " + str(step) + ", Minibatch Loss= " +
    "{:.4f}".format(loss) + ", Training Accuracy= " +
    "{:.3f}".format(acc) + ", Test Accuracy= " +
    "{:.3f}".format(val_acc))
    pred00 = sess.run([prediction],feed_dict={X: X0_test[parte], is_training: False})
    return loss


    Then map each chunk of the train function to the pool:



    if __name__ == '__main__':
    pool = multiprocessing.Pool(16)
    results = pool.map(train, np.arange(0,15))
    pool.close()
    pool.join()


    The model is saving weights properly, however there are two problems: 1. the output on every 10 epochs is only being printed sometimes and 2. the pool process is not being closed, halting Jupyter cell being evaluated.



    Is there a way I can make this work?










    share|improve this question


























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I am using multiprocessing to train a Tensorflow model, splitting x_train and y_train in chunks of equal size to use 16 processors, then use an ensemble.



      I am using a train function for Tensorflow:



      def train(parte):
      with tf.Session(graph=graph, config=config) as sess:
      sess.run(init)
      for step in range(1, training_epochs+1):
      Xt, Yt = next_batch(batch_size, X0[parte], Y0[parte])
      batch_x, batch_y = Xt,Yt
      sess.run(train_op, feed_dict={X: batch_x, Y: batch_y, is_training: True})
      if step % display_step == 0 or step == 1:
      loss, acc = sess.run([loss_op, accuracy], feed_dict={
      X: batch_x, Y: batch_y, is_training: False})
      test_len = Y0_test[parte].shape[0]
      test_data = X0_test[parte]
      test_label = Y0_test[parte]
      val_acc = sess.run(accuracy, feed_dict={X: test_data, Y: test_label, is_training: False})
      if loss<0.4:
      print("Step " + str(step) + ", Minibatch Loss= " +
      "{:.4f}".format(loss) + ", Training Accuracy= " +
      "{:.3f}".format(acc) + ", Test Accuracy= " +
      "{:.3f}".format(val_acc))
      pred00 = sess.run([prediction],feed_dict={X: X0_test[parte], is_training: False})
      return loss


      Then map each chunk of the train function to the pool:



      if __name__ == '__main__':
      pool = multiprocessing.Pool(16)
      results = pool.map(train, np.arange(0,15))
      pool.close()
      pool.join()


      The model is saving weights properly, however there are two problems: 1. the output on every 10 epochs is only being printed sometimes and 2. the pool process is not being closed, halting Jupyter cell being evaluated.



      Is there a way I can make this work?










      share|improve this question















      I am using multiprocessing to train a Tensorflow model, splitting x_train and y_train in chunks of equal size to use 16 processors, then use an ensemble.



      I am using a train function for Tensorflow:



      def train(parte):
      with tf.Session(graph=graph, config=config) as sess:
      sess.run(init)
      for step in range(1, training_epochs+1):
      Xt, Yt = next_batch(batch_size, X0[parte], Y0[parte])
      batch_x, batch_y = Xt,Yt
      sess.run(train_op, feed_dict={X: batch_x, Y: batch_y, is_training: True})
      if step % display_step == 0 or step == 1:
      loss, acc = sess.run([loss_op, accuracy], feed_dict={
      X: batch_x, Y: batch_y, is_training: False})
      test_len = Y0_test[parte].shape[0]
      test_data = X0_test[parte]
      test_label = Y0_test[parte]
      val_acc = sess.run(accuracy, feed_dict={X: test_data, Y: test_label, is_training: False})
      if loss<0.4:
      print("Step " + str(step) + ", Minibatch Loss= " +
      "{:.4f}".format(loss) + ", Training Accuracy= " +
      "{:.3f}".format(acc) + ", Test Accuracy= " +
      "{:.3f}".format(val_acc))
      pred00 = sess.run([prediction],feed_dict={X: X0_test[parte], is_training: False})
      return loss


      Then map each chunk of the train function to the pool:



      if __name__ == '__main__':
      pool = multiprocessing.Pool(16)
      results = pool.map(train, np.arange(0,15))
      pool.close()
      pool.join()


      The model is saving weights properly, however there are two problems: 1. the output on every 10 epochs is only being printed sometimes and 2. the pool process is not being closed, halting Jupyter cell being evaluated.



      Is there a way I can make this work?







      tensorflow multiprocessing python-multiprocessing






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 13 at 14:32

























      asked Nov 13 at 12:39









      Rubens_Z

      418415




      418415





























          active

          oldest

          votes











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53281195%2fmultiprocessing-for-tensorflow-model-does-not-return-outputs-and-halts%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown






























          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53281195%2fmultiprocessing-for-tensorflow-model-does-not-return-outputs-and-halts%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

          ComboBox Display Member on multiple fields

          Is it possible to collect Nectar points via Trainline?