How do I limit write operations to 1k records/sec?

Currently, I am able to write to database in the batchsize of 500. But due to the memory shortage error and delay synchronization between child aggregator and leaf node of database, sometimes I am running into Leaf Node Memory Error. The only solution for this is if I limit my write operations to 1k records per second, I can get rid of the error.

dataStream

  .map(line => readJsonFromString(line))

  .grouped(memsqlBatchSize)

  .foreach { recordSet =>

          val dbRecords = recordSet.map(m => (m, Events.transform(m)))

          dbRecords.map { record =>

            try {

              Events.setValues(eventInsert, record._2)

              eventInsert.addBatch

            } catch {

              case e: Exception =>

                logger.error(s"error adding batch: ${e.getMessage}")

                val error_event = Events.jm.writeValueAsString(mapAsJavaMap(record._1.asInstanceOf[Map[String, Object]]))

                logger.error(s"event: $error_event")

            }

          }



          // Bulk Commit Records

          try {

            eventInsert.executeBatch

          } catch {

            case e: java.sql.BatchUpdateException =>

              val updates = e.getUpdateCounts

              logger.error(s"failed commit: ${updates.toString}")

              updates.zipWithIndex.filter { case (v, i) => v == Statement.EXECUTE_FAILED }.foreach { case (v, i) =>

                val error = Events.jm.writeValueAsString(mapAsJavaMap(dbRecords(i)._1.asInstanceOf[Map[String, Object]]))

                logger.error(s"insert error: $error")

                logger.error(e.getMessage)

              }

          }

          finally {

            connection.commit

            eventInsert.clearBatch

            logger.debug(s"committed: ${dbRecords.length.toString}")

          }

        }

The reason for 1k records is that, some of the data that I am trying to write can contains tons of json records and if batch size if 500, that may result in 30k records per second. Is there any way so that I can make sure that only 1000 records will be written to the database in a batch irrespective of the number of records?

edited Nov 21 '18 at 8:46

Sarvesh Kumar Singh

8,3961935

asked Nov 20 '18 at 23:00

Aniruddha Tekade

123111

1

The way I see it, given code does not tell us any thing about the problem mentioned by you. And it looks like that dbRecords.map should be dbRecords.foreach

– Sarvesh Kumar Singh
Nov 21 '18 at 8:45

@SarveshKumarSingh came up with a solution. I think either I introduce delay of 1.0 sec or use Thread.sleep(1000) so that I can limit it. May be I did not understand the requirement right before. According to the architecture of memsql, all loads into memsql are done into a rowstore first, from there memsql will merge into the columnstore. But Memsql first collects data into local memory and then flushes to leafs. That resulted into the leaf error everytime. Reducing the batchsize and introducing a delay helped me writing 120,000 records/sec. Performed testing with this benchmark.

– Aniruddha Tekade
Nov 21 '18 at 21:22

add a comment |

dataStream

  .map(line => readJsonFromString(line))

  .grouped(memsqlBatchSize)

  .foreach { recordSet =>

          val dbRecords = recordSet.map(m => (m, Events.transform(m)))

          dbRecords.map { record =>

            try {

              Events.setValues(eventInsert, record._2)

              eventInsert.addBatch

            } catch {

              case e: Exception =>

                logger.error(s"error adding batch: ${e.getMessage}")

                val error_event = Events.jm.writeValueAsString(mapAsJavaMap(record._1.asInstanceOf[Map[String, Object]]))

                logger.error(s"event: $error_event")

            }

          }



          // Bulk Commit Records

          try {

            eventInsert.executeBatch

          } catch {

            case e: java.sql.BatchUpdateException =>

              val updates = e.getUpdateCounts

              logger.error(s"failed commit: ${updates.toString}")

              updates.zipWithIndex.filter { case (v, i) => v == Statement.EXECUTE_FAILED }.foreach { case (v, i) =>

                val error = Events.jm.writeValueAsString(mapAsJavaMap(dbRecords(i)._1.asInstanceOf[Map[String, Object]]))

                logger.error(s"insert error: $error")

                logger.error(e.getMessage)

              }

          }

          finally {

            connection.commit

            eventInsert.clearBatch

            logger.debug(s"committed: ${dbRecords.length.toString}")

          }

        }

edited Nov 21 '18 at 8:46

Sarvesh Kumar Singh

8,3961935

asked Nov 20 '18 at 23:00

Aniruddha Tekade

123111

1

The way I see it, given code does not tell us any thing about the problem mentioned by you. And it looks like that dbRecords.map should be dbRecords.foreach

– Sarvesh Kumar Singh
Nov 21 '18 at 8:45

@SarveshKumarSingh came up with a solution. I think either I introduce delay of 1.0 sec or use Thread.sleep(1000) so that I can limit it. May be I did not understand the requirement right before. According to the architecture of memsql, all loads into memsql are done into a rowstore first, from there memsql will merge into the columnstore. But Memsql first collects data into local memory and then flushes to leafs. That resulted into the leaf error everytime. Reducing the batchsize and introducing a delay helped me writing 120,000 records/sec. Performed testing with this benchmark.

– Aniruddha Tekade
Nov 21 '18 at 21:22

add a comment |

dataStream

  .map(line => readJsonFromString(line))

  .grouped(memsqlBatchSize)

  .foreach { recordSet =>

          val dbRecords = recordSet.map(m => (m, Events.transform(m)))

          dbRecords.map { record =>

            try {

              Events.setValues(eventInsert, record._2)

              eventInsert.addBatch

            } catch {

              case e: Exception =>

                logger.error(s"error adding batch: ${e.getMessage}")

                val error_event = Events.jm.writeValueAsString(mapAsJavaMap(record._1.asInstanceOf[Map[String, Object]]))

                logger.error(s"event: $error_event")

            }

          }



          // Bulk Commit Records

          try {

            eventInsert.executeBatch

          } catch {

            case e: java.sql.BatchUpdateException =>

              val updates = e.getUpdateCounts

              logger.error(s"failed commit: ${updates.toString}")

              updates.zipWithIndex.filter { case (v, i) => v == Statement.EXECUTE_FAILED }.foreach { case (v, i) =>

                val error = Events.jm.writeValueAsString(mapAsJavaMap(dbRecords(i)._1.asInstanceOf[Map[String, Object]]))

                logger.error(s"insert error: $error")

                logger.error(e.getMessage)

              }

          }

          finally {

            connection.commit

            eventInsert.clearBatch

            logger.debug(s"committed: ${dbRecords.length.toString}")

          }

        }

edited Nov 21 '18 at 8:46

Sarvesh Kumar Singh

8,3961935

asked Nov 20 '18 at 23:00

Aniruddha Tekade

123111

dataStream

  .map(line => readJsonFromString(line))

  .grouped(memsqlBatchSize)

  .foreach { recordSet =>

          val dbRecords = recordSet.map(m => (m, Events.transform(m)))

          dbRecords.map { record =>

            try {

              Events.setValues(eventInsert, record._2)

              eventInsert.addBatch

            } catch {

              case e: Exception =>

                logger.error(s"error adding batch: ${e.getMessage}")

                val error_event = Events.jm.writeValueAsString(mapAsJavaMap(record._1.asInstanceOf[Map[String, Object]]))

                logger.error(s"event: $error_event")

            }

          }



          // Bulk Commit Records

          try {

            eventInsert.executeBatch

          } catch {

            case e: java.sql.BatchUpdateException =>

              val updates = e.getUpdateCounts

              logger.error(s"failed commit: ${updates.toString}")

              updates.zipWithIndex.filter { case (v, i) => v == Statement.EXECUTE_FAILED }.foreach { case (v, i) =>

                val error = Events.jm.writeValueAsString(mapAsJavaMap(dbRecords(i)._1.asInstanceOf[Map[String, Object]]))

                logger.error(s"insert error: $error")

                logger.error(e.getMessage)

              }

          }

          finally {

            connection.commit

            eventInsert.clearBatch

            logger.debug(s"committed: ${dbRecords.length.toString}")

          }

        }

multithreading scala performance thread-synchronization memsql

edited Nov 21 '18 at 8:46

Sarvesh Kumar Singh

8,3961935

asked Nov 20 '18 at 23:00

Aniruddha Tekade

123111

edited Nov 21 '18 at 8:46

Sarvesh Kumar Singh

8,3961935

asked Nov 20 '18 at 23:00

Aniruddha Tekade

123111

edited Nov 21 '18 at 8:46

Sarvesh Kumar Singh

8,3961935

edited Nov 21 '18 at 8:46

Sarvesh Kumar Singh

8,3961935

edited Nov 21 '18 at 8:46

Sarvesh Kumar Singh

8,3961935

asked Nov 20 '18 at 23:00

Aniruddha Tekade

123111

asked Nov 20 '18 at 23:00

Aniruddha Tekade

123111

asked Nov 20 '18 at 23:00

Aniruddha Tekade

123111

1

The way I see it, given code does not tell us any thing about the problem mentioned by you. And it looks like that dbRecords.map should be dbRecords.foreach

– Sarvesh Kumar Singh
Nov 21 '18 at 8:45

@SarveshKumarSingh came up with a solution. I think either I introduce delay of 1.0 sec or use Thread.sleep(1000) so that I can limit it. May be I did not understand the requirement right before. According to the architecture of memsql, all loads into memsql are done into a rowstore first, from there memsql will merge into the columnstore. But Memsql first collects data into local memory and then flushes to leafs. That resulted into the leaf error everytime. Reducing the batchsize and introducing a delay helped me writing 120,000 records/sec. Performed testing with this benchmark.

– Aniruddha Tekade
Nov 21 '18 at 21:22

add a comment |

1

The way I see it, given code does not tell us any thing about the problem mentioned by you. And it looks like that dbRecords.map should be dbRecords.foreach

– Sarvesh Kumar Singh
Nov 21 '18 at 8:45

@SarveshKumarSingh came up with a solution. I think either I introduce delay of 1.0 sec or use Thread.sleep(1000) so that I can limit it. May be I did not understand the requirement right before. According to the architecture of memsql, all loads into memsql are done into a rowstore first, from there memsql will merge into the columnstore. But Memsql first collects data into local memory and then flushes to leafs. That resulted into the leaf error everytime. Reducing the batchsize and introducing a delay helped me writing 120,000 records/sec. Performed testing with this benchmark.

– Aniruddha Tekade
Nov 21 '18 at 21:22

The way I see it, given code does not tell us any thing about the problem mentioned by you. And it looks like that dbRecords.map should be dbRecords.foreach

– Sarvesh Kumar Singh
Nov 21 '18 at 8:45

@SarveshKumarSingh came up with a solution. I think either I introduce delay of 1.0 sec or use Thread.sleep(1000) so that I can limit it. May be I did not understand the requirement right before. According to the architecture of memsql, all loads into memsql are done into a rowstore first, from there memsql will merge into the columnstore. But Memsql first collects data into local memory and then flushes to leafs. That resulted into the leaf error everytime. Reducing the batchsize and introducing a delay helped me writing 120,000 records/sec. Performed testing with this benchmark.

– Aniruddha Tekade
Nov 21 '18 at 21:22

add a comment |

2 Answers
2

active

oldest

votes

I don't think Thead.sleep is a good idea to handle this situation. Generally we don't recommend to do so in Scala and we don't want to block the thread in any case.

One suggestion would be using any Streaming techniques such as Akka.Stream, Monix.Observable. There are some pro and cons between those libraries I don't want to spend too much paragraph on it. But they do support back pressure to control the producing rate when consumer is slower than producer. For example, in your case your consumer is database writing and your producer maybe is reading some json files and doing some aggregations.

The following code illustrates the idea and you will need to modify as your need:

val sourceJson = Source(dataStream.map(line => readJsonFromString(line)))

val sinkDB = Sink(Events.jm.writeValueAsString) // you will need to figure out how to generate the Sink

val flowThrottle = Flow[String]

  .throttle(1, 1.second, 1, ThrottleMode.shaping)



val runnable = sourceJson.via[flowThrottle].toMat(sinkDB)(Keep.right)

val result = runnable.run()

answered Nov 22 '18 at 7:10

tomcy

16517

but I don't understand the runnable statement in your code. My requirement is if have I threads=4, batchSize=500 and I somehow want to throttle the writing on producer side limited to 1k records/sec, how can I assure this with your code?

– Aniruddha Tekade
Dec 10 '18 at 23:56

add a comment |

The code block is already called by a thread and there are multiple threads running in parallel. Either I can use Thread.sleep(1000) or delay(1.0) in this scala code. But if I use delay() it will use a promise which might have to call outside the function. Looks like Thread.sleep() is the best option along with batch size of 1000. After performing the testing, I could benchmark 120,000 records/thread/sec without any problem.

According to the architecture of memsql, all loads into memsql are done into a rowstore first into the local memory and from there memsql will merge into the columnstore at the end leaves. That resulted into the leaf error everytime I pushed more number of data causing bottleneck. Reducing the batchsize and introducing a Thread.sleep() helped me writing 120,000 records/sec. Performed testing with this benchmark.

answered Nov 21 '18 at 21:34

Aniruddha Tekade

123111

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402884%2fhow-do-i-limit-write-operations-to-1k-records-sec%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

I don't think Thead.sleep is a good idea to handle this situation. Generally we don't recommend to do so in Scala and we don't want to block the thread in any case.

The following code illustrates the idea and you will need to modify as your need:

val sourceJson = Source(dataStream.map(line => readJsonFromString(line)))

val sinkDB = Sink(Events.jm.writeValueAsString) // you will need to figure out how to generate the Sink

val flowThrottle = Flow[String]

  .throttle(1, 1.second, 1, ThrottleMode.shaping)



val runnable = sourceJson.via[flowThrottle].toMat(sinkDB)(Keep.right)

val result = runnable.run()

answered Nov 22 '18 at 7:10

tomcy

16517

but I don't understand the runnable statement in your code. My requirement is if have I threads=4, batchSize=500 and I somehow want to throttle the writing on producer side limited to 1k records/sec, how can I assure this with your code?

– Aniruddha Tekade
Dec 10 '18 at 23:56

add a comment |

I don't think Thead.sleep is a good idea to handle this situation. Generally we don't recommend to do so in Scala and we don't want to block the thread in any case.

The following code illustrates the idea and you will need to modify as your need:

val sourceJson = Source(dataStream.map(line => readJsonFromString(line)))

val sinkDB = Sink(Events.jm.writeValueAsString) // you will need to figure out how to generate the Sink

val flowThrottle = Flow[String]

  .throttle(1, 1.second, 1, ThrottleMode.shaping)



val runnable = sourceJson.via[flowThrottle].toMat(sinkDB)(Keep.right)

val result = runnable.run()

answered Nov 22 '18 at 7:10

tomcy

16517

but I don't understand the runnable statement in your code. My requirement is if have I threads=4, batchSize=500 and I somehow want to throttle the writing on producer side limited to 1k records/sec, how can I assure this with your code?

– Aniruddha Tekade
Dec 10 '18 at 23:56

add a comment |

I don't think Thead.sleep is a good idea to handle this situation. Generally we don't recommend to do so in Scala and we don't want to block the thread in any case.

The following code illustrates the idea and you will need to modify as your need:

val sourceJson = Source(dataStream.map(line => readJsonFromString(line)))

val sinkDB = Sink(Events.jm.writeValueAsString) // you will need to figure out how to generate the Sink

val flowThrottle = Flow[String]

  .throttle(1, 1.second, 1, ThrottleMode.shaping)



val runnable = sourceJson.via[flowThrottle].toMat(sinkDB)(Keep.right)

val result = runnable.run()

answered Nov 22 '18 at 7:10

tomcy

16517

I don't think Thead.sleep is a good idea to handle this situation. Generally we don't recommend to do so in Scala and we don't want to block the thread in any case.

The following code illustrates the idea and you will need to modify as your need:

val sourceJson = Source(dataStream.map(line => readJsonFromString(line)))

val sinkDB = Sink(Events.jm.writeValueAsString) // you will need to figure out how to generate the Sink

val flowThrottle = Flow[String]

  .throttle(1, 1.second, 1, ThrottleMode.shaping)



val runnable = sourceJson.via[flowThrottle].toMat(sinkDB)(Keep.right)

val result = runnable.run()

answered Nov 22 '18 at 7:10

tomcy

16517

answered Nov 22 '18 at 7:10

tomcy

16517

answered Nov 22 '18 at 7:10

tomcy

16517

answered Nov 22 '18 at 7:10

tomcy

16517

but I don't understand the runnable statement in your code. My requirement is if have I threads=4, batchSize=500 and I somehow want to throttle the writing on producer side limited to 1k records/sec, how can I assure this with your code?

– Aniruddha Tekade
Dec 10 '18 at 23:56

add a comment |

but I don't understand the runnable statement in your code. My requirement is if have I threads=4, batchSize=500 and I somehow want to throttle the writing on producer side limited to 1k records/sec, how can I assure this with your code?

– Aniruddha Tekade
Dec 10 '18 at 23:56

but I don't understand the runnable statement in your code. My requirement is if have I threads=4, batchSize=500 and I somehow want to throttle the writing on producer side limited to 1k records/sec, how can I assure this with your code?

– Aniruddha Tekade
Dec 10 '18 at 23:56

add a comment |

answered Nov 21 '18 at 21:34

Aniruddha Tekade

123111

add a comment |

answered Nov 21 '18 at 21:34

Aniruddha Tekade

123111

add a comment |

answered Nov 21 '18 at 21:34

Aniruddha Tekade

123111

answered Nov 21 '18 at 21:34

Aniruddha Tekade

123111

answered Nov 21 '18 at 21:34

Aniruddha Tekade

123111

answered Nov 21 '18 at 21:34

Aniruddha Tekade

123111

answered Nov 21 '18 at 21:34

Aniruddha Tekade

123111

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky