VoltDB pass execute multiple inserts in one invoke, C++ API












1















I currently have a model where a large number of inserts need to be done (not at startup) on the same table. For the time being I am preparing the insert values set inside the C++ code and then calling the insert stored procedure individually.



e.g.



INSERT ... VALUES ('1','2')
INSERT ... VALUES ('3','4')
INSERT ... VALUES ('5','6')


I would like to know if it is possible (using VoltDB and the C++ client) to either:



1) Do bulk inserts
e.g.



INSERT ... VALUES ('1','2'), ('3','4'), ('5','6')


or



2) Pass an array or a string containing a custom delimiter into the stored procedure, then parse it inside and call the individual inserts inside the stored procedure itself.



INSERT ... VALUES ('1,2|3,4|5,6') or similar


then split the string inside the procedure.



If either is possible, could you please point me either to an example, or to the C++ API syntax that would facilitate the implementation? (e.g. looping in stored procedure, in order to parse the string and/or string manipulation functions, etc.)



I would like to try one of these options, in order to test the relative performance. Although I've read that individual inserts should be fast enough, I would think this can differ based on the use case.










share|improve this question



























    1















    I currently have a model where a large number of inserts need to be done (not at startup) on the same table. For the time being I am preparing the insert values set inside the C++ code and then calling the insert stored procedure individually.



    e.g.



    INSERT ... VALUES ('1','2')
    INSERT ... VALUES ('3','4')
    INSERT ... VALUES ('5','6')


    I would like to know if it is possible (using VoltDB and the C++ client) to either:



    1) Do bulk inserts
    e.g.



    INSERT ... VALUES ('1','2'), ('3','4'), ('5','6')


    or



    2) Pass an array or a string containing a custom delimiter into the stored procedure, then parse it inside and call the individual inserts inside the stored procedure itself.



    INSERT ... VALUES ('1,2|3,4|5,6') or similar


    then split the string inside the procedure.



    If either is possible, could you please point me either to an example, or to the C++ API syntax that would facilitate the implementation? (e.g. looping in stored procedure, in order to parse the string and/or string manipulation functions, etc.)



    I would like to try one of these options, in order to test the relative performance. Although I've read that individual inserts should be fast enough, I would think this can differ based on the use case.










    share|improve this question

























      1












      1








      1








      I currently have a model where a large number of inserts need to be done (not at startup) on the same table. For the time being I am preparing the insert values set inside the C++ code and then calling the insert stored procedure individually.



      e.g.



      INSERT ... VALUES ('1','2')
      INSERT ... VALUES ('3','4')
      INSERT ... VALUES ('5','6')


      I would like to know if it is possible (using VoltDB and the C++ client) to either:



      1) Do bulk inserts
      e.g.



      INSERT ... VALUES ('1','2'), ('3','4'), ('5','6')


      or



      2) Pass an array or a string containing a custom delimiter into the stored procedure, then parse it inside and call the individual inserts inside the stored procedure itself.



      INSERT ... VALUES ('1,2|3,4|5,6') or similar


      then split the string inside the procedure.



      If either is possible, could you please point me either to an example, or to the C++ API syntax that would facilitate the implementation? (e.g. looping in stored procedure, in order to parse the string and/or string manipulation functions, etc.)



      I would like to try one of these options, in order to test the relative performance. Although I've read that individual inserts should be fast enough, I would think this can differ based on the use case.










      share|improve this question














      I currently have a model where a large number of inserts need to be done (not at startup) on the same table. For the time being I am preparing the insert values set inside the C++ code and then calling the insert stored procedure individually.



      e.g.



      INSERT ... VALUES ('1','2')
      INSERT ... VALUES ('3','4')
      INSERT ... VALUES ('5','6')


      I would like to know if it is possible (using VoltDB and the C++ client) to either:



      1) Do bulk inserts
      e.g.



      INSERT ... VALUES ('1','2'), ('3','4'), ('5','6')


      or



      2) Pass an array or a string containing a custom delimiter into the stored procedure, then parse it inside and call the individual inserts inside the stored procedure itself.



      INSERT ... VALUES ('1,2|3,4|5,6') or similar


      then split the string inside the procedure.



      If either is possible, could you please point me either to an example, or to the C++ API syntax that would facilitate the implementation? (e.g. looping in stored procedure, in order to parse the string and/or string manipulation functions, etc.)



      I would like to try one of these options, in order to test the relative performance. Although I've read that individual inserts should be fast enough, I would think this can differ based on the use case.







      c++ bulkinsert voltdb






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 19 '18 at 16:11









      MartyMarty

      5619




      5619
























          1 Answer
          1






          active

          oldest

          votes


















          1














          Individual inserts would be faster if you called the default insert procedure for the table, e.g. "TABLENAME.insert", which takes the same values as INSERT ... VALUES, but bypasses the AdHoc SQL parser and is routed more directly to the partition. That will give you the best performance to insert records using an individual procedure call for each row.



          On the java client, there is an API that facilitates bulk loading of a table. There is an example tutorial for it here: https://github.com/VoltDB/voltdb/tree/master/examples/HOWTOs/bulkloader



          If the data exists in a CSV or delimited file, you could leverage the csvloader application, which uses the same bulkloader API.



          The C++ client does not have an implementation of the bulkloader API, so while it's not impossible, it would be a lot more difficult.



          Bulk inserts in the form of INSERT ... VALUES ('1','2'),('3','4'),... are not supported by VoltDB.



          The other approach you describe is possible. You could write a Java stored procedure that takes a VoltTable as input parameter, and from the C++ client build a Table object, which corresponds to the VoltTable in Java. Or, you could pass in arrays of values. However, neither the VoltTable or an array can be the partitioning key parameter for the procedure. So if you are trying to do something high scale, you would want to have a separate parameter value for the partition key, and you would need to send a set of records that all belong in the same partition. That can be difficult to do. The easiest way is if you write your own simple hashing function. As you generate or receive new records, you can hash them with your function and group them into buckets, then send these sets of records to the database in bulk, with the hash value as the partition key. But you would have to include a column in the table for this hash value. Records that have the same hash value would therefore belong in the same partition.



          Disclosure: I work at VoltDB.






          share|improve this answer
























          • Thank you for the reply. I am already using stored procedures instead of the AdHoc proc. However, these are procedures that I wrote, not the default ones. Would there be any difference between the default insert procedure and its equivalent one manually created?

            – Marty
            Nov 20 '18 at 8:10











          • No, an equivalent manually created procedure should perform the same as the default procedure.

            – BenjaminBallard
            Nov 20 '18 at 14:09











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378643%2fvoltdb-pass-execute-multiple-inserts-in-one-invoke-c-api%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          Individual inserts would be faster if you called the default insert procedure for the table, e.g. "TABLENAME.insert", which takes the same values as INSERT ... VALUES, but bypasses the AdHoc SQL parser and is routed more directly to the partition. That will give you the best performance to insert records using an individual procedure call for each row.



          On the java client, there is an API that facilitates bulk loading of a table. There is an example tutorial for it here: https://github.com/VoltDB/voltdb/tree/master/examples/HOWTOs/bulkloader



          If the data exists in a CSV or delimited file, you could leverage the csvloader application, which uses the same bulkloader API.



          The C++ client does not have an implementation of the bulkloader API, so while it's not impossible, it would be a lot more difficult.



          Bulk inserts in the form of INSERT ... VALUES ('1','2'),('3','4'),... are not supported by VoltDB.



          The other approach you describe is possible. You could write a Java stored procedure that takes a VoltTable as input parameter, and from the C++ client build a Table object, which corresponds to the VoltTable in Java. Or, you could pass in arrays of values. However, neither the VoltTable or an array can be the partitioning key parameter for the procedure. So if you are trying to do something high scale, you would want to have a separate parameter value for the partition key, and you would need to send a set of records that all belong in the same partition. That can be difficult to do. The easiest way is if you write your own simple hashing function. As you generate or receive new records, you can hash them with your function and group them into buckets, then send these sets of records to the database in bulk, with the hash value as the partition key. But you would have to include a column in the table for this hash value. Records that have the same hash value would therefore belong in the same partition.



          Disclosure: I work at VoltDB.






          share|improve this answer
























          • Thank you for the reply. I am already using stored procedures instead of the AdHoc proc. However, these are procedures that I wrote, not the default ones. Would there be any difference between the default insert procedure and its equivalent one manually created?

            – Marty
            Nov 20 '18 at 8:10











          • No, an equivalent manually created procedure should perform the same as the default procedure.

            – BenjaminBallard
            Nov 20 '18 at 14:09
















          1














          Individual inserts would be faster if you called the default insert procedure for the table, e.g. "TABLENAME.insert", which takes the same values as INSERT ... VALUES, but bypasses the AdHoc SQL parser and is routed more directly to the partition. That will give you the best performance to insert records using an individual procedure call for each row.



          On the java client, there is an API that facilitates bulk loading of a table. There is an example tutorial for it here: https://github.com/VoltDB/voltdb/tree/master/examples/HOWTOs/bulkloader



          If the data exists in a CSV or delimited file, you could leverage the csvloader application, which uses the same bulkloader API.



          The C++ client does not have an implementation of the bulkloader API, so while it's not impossible, it would be a lot more difficult.



          Bulk inserts in the form of INSERT ... VALUES ('1','2'),('3','4'),... are not supported by VoltDB.



          The other approach you describe is possible. You could write a Java stored procedure that takes a VoltTable as input parameter, and from the C++ client build a Table object, which corresponds to the VoltTable in Java. Or, you could pass in arrays of values. However, neither the VoltTable or an array can be the partitioning key parameter for the procedure. So if you are trying to do something high scale, you would want to have a separate parameter value for the partition key, and you would need to send a set of records that all belong in the same partition. That can be difficult to do. The easiest way is if you write your own simple hashing function. As you generate or receive new records, you can hash them with your function and group them into buckets, then send these sets of records to the database in bulk, with the hash value as the partition key. But you would have to include a column in the table for this hash value. Records that have the same hash value would therefore belong in the same partition.



          Disclosure: I work at VoltDB.






          share|improve this answer
























          • Thank you for the reply. I am already using stored procedures instead of the AdHoc proc. However, these are procedures that I wrote, not the default ones. Would there be any difference between the default insert procedure and its equivalent one manually created?

            – Marty
            Nov 20 '18 at 8:10











          • No, an equivalent manually created procedure should perform the same as the default procedure.

            – BenjaminBallard
            Nov 20 '18 at 14:09














          1












          1








          1







          Individual inserts would be faster if you called the default insert procedure for the table, e.g. "TABLENAME.insert", which takes the same values as INSERT ... VALUES, but bypasses the AdHoc SQL parser and is routed more directly to the partition. That will give you the best performance to insert records using an individual procedure call for each row.



          On the java client, there is an API that facilitates bulk loading of a table. There is an example tutorial for it here: https://github.com/VoltDB/voltdb/tree/master/examples/HOWTOs/bulkloader



          If the data exists in a CSV or delimited file, you could leverage the csvloader application, which uses the same bulkloader API.



          The C++ client does not have an implementation of the bulkloader API, so while it's not impossible, it would be a lot more difficult.



          Bulk inserts in the form of INSERT ... VALUES ('1','2'),('3','4'),... are not supported by VoltDB.



          The other approach you describe is possible. You could write a Java stored procedure that takes a VoltTable as input parameter, and from the C++ client build a Table object, which corresponds to the VoltTable in Java. Or, you could pass in arrays of values. However, neither the VoltTable or an array can be the partitioning key parameter for the procedure. So if you are trying to do something high scale, you would want to have a separate parameter value for the partition key, and you would need to send a set of records that all belong in the same partition. That can be difficult to do. The easiest way is if you write your own simple hashing function. As you generate or receive new records, you can hash them with your function and group them into buckets, then send these sets of records to the database in bulk, with the hash value as the partition key. But you would have to include a column in the table for this hash value. Records that have the same hash value would therefore belong in the same partition.



          Disclosure: I work at VoltDB.






          share|improve this answer













          Individual inserts would be faster if you called the default insert procedure for the table, e.g. "TABLENAME.insert", which takes the same values as INSERT ... VALUES, but bypasses the AdHoc SQL parser and is routed more directly to the partition. That will give you the best performance to insert records using an individual procedure call for each row.



          On the java client, there is an API that facilitates bulk loading of a table. There is an example tutorial for it here: https://github.com/VoltDB/voltdb/tree/master/examples/HOWTOs/bulkloader



          If the data exists in a CSV or delimited file, you could leverage the csvloader application, which uses the same bulkloader API.



          The C++ client does not have an implementation of the bulkloader API, so while it's not impossible, it would be a lot more difficult.



          Bulk inserts in the form of INSERT ... VALUES ('1','2'),('3','4'),... are not supported by VoltDB.



          The other approach you describe is possible. You could write a Java stored procedure that takes a VoltTable as input parameter, and from the C++ client build a Table object, which corresponds to the VoltTable in Java. Or, you could pass in arrays of values. However, neither the VoltTable or an array can be the partitioning key parameter for the procedure. So if you are trying to do something high scale, you would want to have a separate parameter value for the partition key, and you would need to send a set of records that all belong in the same partition. That can be difficult to do. The easiest way is if you write your own simple hashing function. As you generate or receive new records, you can hash them with your function and group them into buckets, then send these sets of records to the database in bulk, with the hash value as the partition key. But you would have to include a column in the table for this hash value. Records that have the same hash value would therefore belong in the same partition.



          Disclosure: I work at VoltDB.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 19 '18 at 19:58









          BenjaminBallardBenjaminBallard

          1,2421011




          1,2421011













          • Thank you for the reply. I am already using stored procedures instead of the AdHoc proc. However, these are procedures that I wrote, not the default ones. Would there be any difference between the default insert procedure and its equivalent one manually created?

            – Marty
            Nov 20 '18 at 8:10











          • No, an equivalent manually created procedure should perform the same as the default procedure.

            – BenjaminBallard
            Nov 20 '18 at 14:09



















          • Thank you for the reply. I am already using stored procedures instead of the AdHoc proc. However, these are procedures that I wrote, not the default ones. Would there be any difference between the default insert procedure and its equivalent one manually created?

            – Marty
            Nov 20 '18 at 8:10











          • No, an equivalent manually created procedure should perform the same as the default procedure.

            – BenjaminBallard
            Nov 20 '18 at 14:09

















          Thank you for the reply. I am already using stored procedures instead of the AdHoc proc. However, these are procedures that I wrote, not the default ones. Would there be any difference between the default insert procedure and its equivalent one manually created?

          – Marty
          Nov 20 '18 at 8:10





          Thank you for the reply. I am already using stored procedures instead of the AdHoc proc. However, these are procedures that I wrote, not the default ones. Would there be any difference between the default insert procedure and its equivalent one manually created?

          – Marty
          Nov 20 '18 at 8:10













          No, an equivalent manually created procedure should perform the same as the default procedure.

          – BenjaminBallard
          Nov 20 '18 at 14:09





          No, an equivalent manually created procedure should perform the same as the default procedure.

          – BenjaminBallard
          Nov 20 '18 at 14:09


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378643%2fvoltdb-pass-execute-multiple-inserts-in-one-invoke-c-api%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to send String Array data to Server using php in android

          Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents

          Is anime1.com a legal site for watching anime?