Scala Set intersection performance












2














Using Scala's scala.collection.Set[T]. Given a small set s with only a few elements and another big set b with lots of elements, is there any performance difference between:



s & b // s intersect b


and



b & s // b intersect s.


If yes, which is the fastest?










share|improve this question


















  • 2




    "If yes, which is the fastest?" -- why not benchmark it and see?
    – John Coleman
    Nov 16 at 0:14










  • You can use filter to take advantage of the size difference to get better performance: smallSet filter bigSet. This will make a new set by adding the items in smallSet are in bigSet without you needing to check each element of bigSet. & may do this under the hood but better to be explicit in my opinion.
    – morsecodist
    Nov 16 at 6:13












  • @morsecodist; According to the source code, intersect() actually is the filter() op.
    – jwvh
    Nov 16 at 8:38










  • @jwvh It is probably overriden, as the implementation which is called eventually is scala.collection.immutable.HashSet.HashTrieSet#intersect0 and looks quite complex to me. Based on my rough benchmarking its peformance is comparable to a filter b and it is much faster than b filter a.
    – Suma
    Nov 16 at 11:38


















2














Using Scala's scala.collection.Set[T]. Given a small set s with only a few elements and another big set b with lots of elements, is there any performance difference between:



s & b // s intersect b


and



b & s // b intersect s.


If yes, which is the fastest?










share|improve this question


















  • 2




    "If yes, which is the fastest?" -- why not benchmark it and see?
    – John Coleman
    Nov 16 at 0:14










  • You can use filter to take advantage of the size difference to get better performance: smallSet filter bigSet. This will make a new set by adding the items in smallSet are in bigSet without you needing to check each element of bigSet. & may do this under the hood but better to be explicit in my opinion.
    – morsecodist
    Nov 16 at 6:13












  • @morsecodist; According to the source code, intersect() actually is the filter() op.
    – jwvh
    Nov 16 at 8:38










  • @jwvh It is probably overriden, as the implementation which is called eventually is scala.collection.immutable.HashSet.HashTrieSet#intersect0 and looks quite complex to me. Based on my rough benchmarking its peformance is comparable to a filter b and it is much faster than b filter a.
    – Suma
    Nov 16 at 11:38
















2












2








2







Using Scala's scala.collection.Set[T]. Given a small set s with only a few elements and another big set b with lots of elements, is there any performance difference between:



s & b // s intersect b


and



b & s // b intersect s.


If yes, which is the fastest?










share|improve this question













Using Scala's scala.collection.Set[T]. Given a small set s with only a few elements and another big set b with lots of elements, is there any performance difference between:



s & b // s intersect b


and



b & s // b intersect s.


If yes, which is the fastest?







scala performance






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 16 at 0:12









pgrandjean

18439




18439








  • 2




    "If yes, which is the fastest?" -- why not benchmark it and see?
    – John Coleman
    Nov 16 at 0:14










  • You can use filter to take advantage of the size difference to get better performance: smallSet filter bigSet. This will make a new set by adding the items in smallSet are in bigSet without you needing to check each element of bigSet. & may do this under the hood but better to be explicit in my opinion.
    – morsecodist
    Nov 16 at 6:13












  • @morsecodist; According to the source code, intersect() actually is the filter() op.
    – jwvh
    Nov 16 at 8:38










  • @jwvh It is probably overriden, as the implementation which is called eventually is scala.collection.immutable.HashSet.HashTrieSet#intersect0 and looks quite complex to me. Based on my rough benchmarking its peformance is comparable to a filter b and it is much faster than b filter a.
    – Suma
    Nov 16 at 11:38
















  • 2




    "If yes, which is the fastest?" -- why not benchmark it and see?
    – John Coleman
    Nov 16 at 0:14










  • You can use filter to take advantage of the size difference to get better performance: smallSet filter bigSet. This will make a new set by adding the items in smallSet are in bigSet without you needing to check each element of bigSet. & may do this under the hood but better to be explicit in my opinion.
    – morsecodist
    Nov 16 at 6:13












  • @morsecodist; According to the source code, intersect() actually is the filter() op.
    – jwvh
    Nov 16 at 8:38










  • @jwvh It is probably overriden, as the implementation which is called eventually is scala.collection.immutable.HashSet.HashTrieSet#intersect0 and looks quite complex to me. Based on my rough benchmarking its peformance is comparable to a filter b and it is much faster than b filter a.
    – Suma
    Nov 16 at 11:38










2




2




"If yes, which is the fastest?" -- why not benchmark it and see?
– John Coleman
Nov 16 at 0:14




"If yes, which is the fastest?" -- why not benchmark it and see?
– John Coleman
Nov 16 at 0:14












You can use filter to take advantage of the size difference to get better performance: smallSet filter bigSet. This will make a new set by adding the items in smallSet are in bigSet without you needing to check each element of bigSet. & may do this under the hood but better to be explicit in my opinion.
– morsecodist
Nov 16 at 6:13






You can use filter to take advantage of the size difference to get better performance: smallSet filter bigSet. This will make a new set by adding the items in smallSet are in bigSet without you needing to check each element of bigSet. & may do this under the hood but better to be explicit in my opinion.
– morsecodist
Nov 16 at 6:13














@morsecodist; According to the source code, intersect() actually is the filter() op.
– jwvh
Nov 16 at 8:38




@morsecodist; According to the source code, intersect() actually is the filter() op.
– jwvh
Nov 16 at 8:38












@jwvh It is probably overriden, as the implementation which is called eventually is scala.collection.immutable.HashSet.HashTrieSet#intersect0 and looks quite complex to me. Based on my rough benchmarking its peformance is comparable to a filter b and it is much faster than b filter a.
– Suma
Nov 16 at 11:38






@jwvh It is probably overriden, as the implementation which is called eventually is scala.collection.immutable.HashSet.HashTrieSet#intersect0 and looks quite complex to me. Based on my rough benchmarking its peformance is comparable to a filter b and it is much faster than b filter a.
– Suma
Nov 16 at 11:38














3 Answers
3






active

oldest

votes


















1














The generic implementation seen in the GenSetLike using intersect is overriden for HashSet with an implementation which looks quite complex to me (see scala.collection.immutable.HashSet.HashTrieSet#intersect0). Based on my rough benchmark its performance is similar for both a & b and b & a and it is similar to the performance of a filter b, which is an order of magnitude faster than b filter a. My testing code is:



object Sets extends App {

def time[R](block: => R): R = {
val t0 = System.nanoTime()
val result = block // call-by-name
val t1 = System.nanoTime()
println("Elapsed time: " + (t1 - t0)/1e6 + "ms")
result
}

val a = (0 until 10000 by 1).toSet //smaller data
val b = (0 until 1000000 by 2).toSet


time {a & b}
time {b & a}
time {a & b}
time {b & a}
time {a & b}
time {b & a}

println("Filter")

time {a filter b}
time {b filter a}
time {a filter b}
time {b filter a}
time {a filter b}
time {b filter a}
}


Result is:




Elapsed time: 6.990442ms
Elapsed time: 4.25017ms
Elapsed time: 4.1089ms
Elapsed time: 4.480789ms
Elapsed time: 3.71588ms
Elapsed time: 3.160159ms
Filter
Elapsed time: 7.781613ms
Elapsed time: 68.33023ms
Elapsed time: 5.858472ms
Elapsed time: 42.491131ms
Elapsed time: 2.982364ms
Elapsed time: 52.762474ms





share|improve this answer































    1














    The answer is: it's complicated.



    The default implementation of an immutable set is scala.collection.immutable.Set. This has special cases for sizes 1 to 4. As soon as you have more than 4 elements, it will use scala.collection.immutable.HashSet.



    Very small s (1..4)



    So let's say you have a large set b and a small set s, with s containing <4 elements.



    Then it will make a large difference:



    b & s will filter all elements of b against membership in s and therefore takes b.count * s.count equality comparisons. Since b is large this will be quite slow.



    s & b will filter the few elements of s against a membership in b, which is s.length times a hashing and an equality comparison if the hashes match (remember b is a HashSet). Since is is small this should be very fast.



    Small s (n>4)



    As soon as s is larger than 4 elements, it also will be a HashSet. Intersection for HashSets is written in a symmetric and very efficient way. It will combine the tree structures of s and b and perform equality comparisons when the hashes match. It will use maximum structural sharing. E.g. if b contains all elements of s, the result will be the same instance as s, so no objects will be allocated.



    General advice



    If you just write generic code where you don't care much about high performance, it is fine to use the default implementations such as scala.collection.Set. However, if you care about performance characteristics it is preferable to use a concrete implementation. E.g. if s and b are declared as scala.collection.immutable.HashSet, you will have consistent high performance independent of order, provided that you have a good hash function.






    share|improve this answer





























      0














      Let us create two sets as per the condition mentioned



         val a = (0 until 10000 by 1).toSet      //smaller data
      val b = (0 until 1000000 by 2).toSet //Relatively larger data


      we can define a time function to check the execution time as below



      def time[R](block: => R): R = {
      val t0 = System.nanoTime()
      val result = block // call-by-name
      val t1 = System.nanoTime()
      println("Elapsed time: " + (t1 - t0) + "ns")
      result
      }


      Now we can check the intersection conditon



      scala> time {a & b}
      Elapsed time: 5895220ns
      res2: scala.collection.immutable.Set[Int] = Set(892, 5810, 8062, ..)

      scala> time {b & a}
      Elapsed time: 6038271ns
      res3: scala.collection.immutable.Set[Int] = Set(892, 5810, 8062, ...)


      So by this we can conclude that intersection between a smaller and large dataset has performance difference and it is better to have the smaller dataset on the left side for faster execution for Scala set






      share|improve this answer

















      • 2




        The difference seem to be quite small given the huge size difference between the two sets. As the benchmark does not contain any warm-up and it performed from REPL, I would say it is inconclusive.
        – Suma
        Nov 16 at 11:20










      • When I try this 100 Times I also get no conclusive answer.
        – pme
        Nov 16 at 11:23











      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53329633%2fscala-set-intersection-performance%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      The generic implementation seen in the GenSetLike using intersect is overriden for HashSet with an implementation which looks quite complex to me (see scala.collection.immutable.HashSet.HashTrieSet#intersect0). Based on my rough benchmark its performance is similar for both a & b and b & a and it is similar to the performance of a filter b, which is an order of magnitude faster than b filter a. My testing code is:



      object Sets extends App {

      def time[R](block: => R): R = {
      val t0 = System.nanoTime()
      val result = block // call-by-name
      val t1 = System.nanoTime()
      println("Elapsed time: " + (t1 - t0)/1e6 + "ms")
      result
      }

      val a = (0 until 10000 by 1).toSet //smaller data
      val b = (0 until 1000000 by 2).toSet


      time {a & b}
      time {b & a}
      time {a & b}
      time {b & a}
      time {a & b}
      time {b & a}

      println("Filter")

      time {a filter b}
      time {b filter a}
      time {a filter b}
      time {b filter a}
      time {a filter b}
      time {b filter a}
      }


      Result is:




      Elapsed time: 6.990442ms
      Elapsed time: 4.25017ms
      Elapsed time: 4.1089ms
      Elapsed time: 4.480789ms
      Elapsed time: 3.71588ms
      Elapsed time: 3.160159ms
      Filter
      Elapsed time: 7.781613ms
      Elapsed time: 68.33023ms
      Elapsed time: 5.858472ms
      Elapsed time: 42.491131ms
      Elapsed time: 2.982364ms
      Elapsed time: 52.762474ms





      share|improve this answer




























        1














        The generic implementation seen in the GenSetLike using intersect is overriden for HashSet with an implementation which looks quite complex to me (see scala.collection.immutable.HashSet.HashTrieSet#intersect0). Based on my rough benchmark its performance is similar for both a & b and b & a and it is similar to the performance of a filter b, which is an order of magnitude faster than b filter a. My testing code is:



        object Sets extends App {

        def time[R](block: => R): R = {
        val t0 = System.nanoTime()
        val result = block // call-by-name
        val t1 = System.nanoTime()
        println("Elapsed time: " + (t1 - t0)/1e6 + "ms")
        result
        }

        val a = (0 until 10000 by 1).toSet //smaller data
        val b = (0 until 1000000 by 2).toSet


        time {a & b}
        time {b & a}
        time {a & b}
        time {b & a}
        time {a & b}
        time {b & a}

        println("Filter")

        time {a filter b}
        time {b filter a}
        time {a filter b}
        time {b filter a}
        time {a filter b}
        time {b filter a}
        }


        Result is:




        Elapsed time: 6.990442ms
        Elapsed time: 4.25017ms
        Elapsed time: 4.1089ms
        Elapsed time: 4.480789ms
        Elapsed time: 3.71588ms
        Elapsed time: 3.160159ms
        Filter
        Elapsed time: 7.781613ms
        Elapsed time: 68.33023ms
        Elapsed time: 5.858472ms
        Elapsed time: 42.491131ms
        Elapsed time: 2.982364ms
        Elapsed time: 52.762474ms





        share|improve this answer


























          1












          1








          1






          The generic implementation seen in the GenSetLike using intersect is overriden for HashSet with an implementation which looks quite complex to me (see scala.collection.immutable.HashSet.HashTrieSet#intersect0). Based on my rough benchmark its performance is similar for both a & b and b & a and it is similar to the performance of a filter b, which is an order of magnitude faster than b filter a. My testing code is:



          object Sets extends App {

          def time[R](block: => R): R = {
          val t0 = System.nanoTime()
          val result = block // call-by-name
          val t1 = System.nanoTime()
          println("Elapsed time: " + (t1 - t0)/1e6 + "ms")
          result
          }

          val a = (0 until 10000 by 1).toSet //smaller data
          val b = (0 until 1000000 by 2).toSet


          time {a & b}
          time {b & a}
          time {a & b}
          time {b & a}
          time {a & b}
          time {b & a}

          println("Filter")

          time {a filter b}
          time {b filter a}
          time {a filter b}
          time {b filter a}
          time {a filter b}
          time {b filter a}
          }


          Result is:




          Elapsed time: 6.990442ms
          Elapsed time: 4.25017ms
          Elapsed time: 4.1089ms
          Elapsed time: 4.480789ms
          Elapsed time: 3.71588ms
          Elapsed time: 3.160159ms
          Filter
          Elapsed time: 7.781613ms
          Elapsed time: 68.33023ms
          Elapsed time: 5.858472ms
          Elapsed time: 42.491131ms
          Elapsed time: 2.982364ms
          Elapsed time: 52.762474ms





          share|improve this answer














          The generic implementation seen in the GenSetLike using intersect is overriden for HashSet with an implementation which looks quite complex to me (see scala.collection.immutable.HashSet.HashTrieSet#intersect0). Based on my rough benchmark its performance is similar for both a & b and b & a and it is similar to the performance of a filter b, which is an order of magnitude faster than b filter a. My testing code is:



          object Sets extends App {

          def time[R](block: => R): R = {
          val t0 = System.nanoTime()
          val result = block // call-by-name
          val t1 = System.nanoTime()
          println("Elapsed time: " + (t1 - t0)/1e6 + "ms")
          result
          }

          val a = (0 until 10000 by 1).toSet //smaller data
          val b = (0 until 1000000 by 2).toSet


          time {a & b}
          time {b & a}
          time {a & b}
          time {b & a}
          time {a & b}
          time {b & a}

          println("Filter")

          time {a filter b}
          time {b filter a}
          time {a filter b}
          time {b filter a}
          time {a filter b}
          time {b filter a}
          }


          Result is:




          Elapsed time: 6.990442ms
          Elapsed time: 4.25017ms
          Elapsed time: 4.1089ms
          Elapsed time: 4.480789ms
          Elapsed time: 3.71588ms
          Elapsed time: 3.160159ms
          Filter
          Elapsed time: 7.781613ms
          Elapsed time: 68.33023ms
          Elapsed time: 5.858472ms
          Elapsed time: 42.491131ms
          Elapsed time: 2.982364ms
          Elapsed time: 52.762474ms






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 16 at 12:13

























          answered Nov 16 at 11:49









          Suma

          21.3k890154




          21.3k890154

























              1














              The answer is: it's complicated.



              The default implementation of an immutable set is scala.collection.immutable.Set. This has special cases for sizes 1 to 4. As soon as you have more than 4 elements, it will use scala.collection.immutable.HashSet.



              Very small s (1..4)



              So let's say you have a large set b and a small set s, with s containing <4 elements.



              Then it will make a large difference:



              b & s will filter all elements of b against membership in s and therefore takes b.count * s.count equality comparisons. Since b is large this will be quite slow.



              s & b will filter the few elements of s against a membership in b, which is s.length times a hashing and an equality comparison if the hashes match (remember b is a HashSet). Since is is small this should be very fast.



              Small s (n>4)



              As soon as s is larger than 4 elements, it also will be a HashSet. Intersection for HashSets is written in a symmetric and very efficient way. It will combine the tree structures of s and b and perform equality comparisons when the hashes match. It will use maximum structural sharing. E.g. if b contains all elements of s, the result will be the same instance as s, so no objects will be allocated.



              General advice



              If you just write generic code where you don't care much about high performance, it is fine to use the default implementations such as scala.collection.Set. However, if you care about performance characteristics it is preferable to use a concrete implementation. E.g. if s and b are declared as scala.collection.immutable.HashSet, you will have consistent high performance independent of order, provided that you have a good hash function.






              share|improve this answer


























                1














                The answer is: it's complicated.



                The default implementation of an immutable set is scala.collection.immutable.Set. This has special cases for sizes 1 to 4. As soon as you have more than 4 elements, it will use scala.collection.immutable.HashSet.



                Very small s (1..4)



                So let's say you have a large set b and a small set s, with s containing <4 elements.



                Then it will make a large difference:



                b & s will filter all elements of b against membership in s and therefore takes b.count * s.count equality comparisons. Since b is large this will be quite slow.



                s & b will filter the few elements of s against a membership in b, which is s.length times a hashing and an equality comparison if the hashes match (remember b is a HashSet). Since is is small this should be very fast.



                Small s (n>4)



                As soon as s is larger than 4 elements, it also will be a HashSet. Intersection for HashSets is written in a symmetric and very efficient way. It will combine the tree structures of s and b and perform equality comparisons when the hashes match. It will use maximum structural sharing. E.g. if b contains all elements of s, the result will be the same instance as s, so no objects will be allocated.



                General advice



                If you just write generic code where you don't care much about high performance, it is fine to use the default implementations such as scala.collection.Set. However, if you care about performance characteristics it is preferable to use a concrete implementation. E.g. if s and b are declared as scala.collection.immutable.HashSet, you will have consistent high performance independent of order, provided that you have a good hash function.






                share|improve this answer
























                  1












                  1








                  1






                  The answer is: it's complicated.



                  The default implementation of an immutable set is scala.collection.immutable.Set. This has special cases for sizes 1 to 4. As soon as you have more than 4 elements, it will use scala.collection.immutable.HashSet.



                  Very small s (1..4)



                  So let's say you have a large set b and a small set s, with s containing <4 elements.



                  Then it will make a large difference:



                  b & s will filter all elements of b against membership in s and therefore takes b.count * s.count equality comparisons. Since b is large this will be quite slow.



                  s & b will filter the few elements of s against a membership in b, which is s.length times a hashing and an equality comparison if the hashes match (remember b is a HashSet). Since is is small this should be very fast.



                  Small s (n>4)



                  As soon as s is larger than 4 elements, it also will be a HashSet. Intersection for HashSets is written in a symmetric and very efficient way. It will combine the tree structures of s and b and perform equality comparisons when the hashes match. It will use maximum structural sharing. E.g. if b contains all elements of s, the result will be the same instance as s, so no objects will be allocated.



                  General advice



                  If you just write generic code where you don't care much about high performance, it is fine to use the default implementations such as scala.collection.Set. However, if you care about performance characteristics it is preferable to use a concrete implementation. E.g. if s and b are declared as scala.collection.immutable.HashSet, you will have consistent high performance independent of order, provided that you have a good hash function.






                  share|improve this answer












                  The answer is: it's complicated.



                  The default implementation of an immutable set is scala.collection.immutable.Set. This has special cases for sizes 1 to 4. As soon as you have more than 4 elements, it will use scala.collection.immutable.HashSet.



                  Very small s (1..4)



                  So let's say you have a large set b and a small set s, with s containing <4 elements.



                  Then it will make a large difference:



                  b & s will filter all elements of b against membership in s and therefore takes b.count * s.count equality comparisons. Since b is large this will be quite slow.



                  s & b will filter the few elements of s against a membership in b, which is s.length times a hashing and an equality comparison if the hashes match (remember b is a HashSet). Since is is small this should be very fast.



                  Small s (n>4)



                  As soon as s is larger than 4 elements, it also will be a HashSet. Intersection for HashSets is written in a symmetric and very efficient way. It will combine the tree structures of s and b and perform equality comparisons when the hashes match. It will use maximum structural sharing. E.g. if b contains all elements of s, the result will be the same instance as s, so no objects will be allocated.



                  General advice



                  If you just write generic code where you don't care much about high performance, it is fine to use the default implementations such as scala.collection.Set. However, if you care about performance characteristics it is preferable to use a concrete implementation. E.g. if s and b are declared as scala.collection.immutable.HashSet, you will have consistent high performance independent of order, provided that you have a good hash function.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 16 at 22:10









                  Rüdiger Klaehn

                  10k13149




                  10k13149























                      0














                      Let us create two sets as per the condition mentioned



                         val a = (0 until 10000 by 1).toSet      //smaller data
                      val b = (0 until 1000000 by 2).toSet //Relatively larger data


                      we can define a time function to check the execution time as below



                      def time[R](block: => R): R = {
                      val t0 = System.nanoTime()
                      val result = block // call-by-name
                      val t1 = System.nanoTime()
                      println("Elapsed time: " + (t1 - t0) + "ns")
                      result
                      }


                      Now we can check the intersection conditon



                      scala> time {a & b}
                      Elapsed time: 5895220ns
                      res2: scala.collection.immutable.Set[Int] = Set(892, 5810, 8062, ..)

                      scala> time {b & a}
                      Elapsed time: 6038271ns
                      res3: scala.collection.immutable.Set[Int] = Set(892, 5810, 8062, ...)


                      So by this we can conclude that intersection between a smaller and large dataset has performance difference and it is better to have the smaller dataset on the left side for faster execution for Scala set






                      share|improve this answer

















                      • 2




                        The difference seem to be quite small given the huge size difference between the two sets. As the benchmark does not contain any warm-up and it performed from REPL, I would say it is inconclusive.
                        – Suma
                        Nov 16 at 11:20










                      • When I try this 100 Times I also get no conclusive answer.
                        – pme
                        Nov 16 at 11:23
















                      0














                      Let us create two sets as per the condition mentioned



                         val a = (0 until 10000 by 1).toSet      //smaller data
                      val b = (0 until 1000000 by 2).toSet //Relatively larger data


                      we can define a time function to check the execution time as below



                      def time[R](block: => R): R = {
                      val t0 = System.nanoTime()
                      val result = block // call-by-name
                      val t1 = System.nanoTime()
                      println("Elapsed time: " + (t1 - t0) + "ns")
                      result
                      }


                      Now we can check the intersection conditon



                      scala> time {a & b}
                      Elapsed time: 5895220ns
                      res2: scala.collection.immutable.Set[Int] = Set(892, 5810, 8062, ..)

                      scala> time {b & a}
                      Elapsed time: 6038271ns
                      res3: scala.collection.immutable.Set[Int] = Set(892, 5810, 8062, ...)


                      So by this we can conclude that intersection between a smaller and large dataset has performance difference and it is better to have the smaller dataset on the left side for faster execution for Scala set






                      share|improve this answer

















                      • 2




                        The difference seem to be quite small given the huge size difference between the two sets. As the benchmark does not contain any warm-up and it performed from REPL, I would say it is inconclusive.
                        – Suma
                        Nov 16 at 11:20










                      • When I try this 100 Times I also get no conclusive answer.
                        – pme
                        Nov 16 at 11:23














                      0












                      0








                      0






                      Let us create two sets as per the condition mentioned



                         val a = (0 until 10000 by 1).toSet      //smaller data
                      val b = (0 until 1000000 by 2).toSet //Relatively larger data


                      we can define a time function to check the execution time as below



                      def time[R](block: => R): R = {
                      val t0 = System.nanoTime()
                      val result = block // call-by-name
                      val t1 = System.nanoTime()
                      println("Elapsed time: " + (t1 - t0) + "ns")
                      result
                      }


                      Now we can check the intersection conditon



                      scala> time {a & b}
                      Elapsed time: 5895220ns
                      res2: scala.collection.immutable.Set[Int] = Set(892, 5810, 8062, ..)

                      scala> time {b & a}
                      Elapsed time: 6038271ns
                      res3: scala.collection.immutable.Set[Int] = Set(892, 5810, 8062, ...)


                      So by this we can conclude that intersection between a smaller and large dataset has performance difference and it is better to have the smaller dataset on the left side for faster execution for Scala set






                      share|improve this answer












                      Let us create two sets as per the condition mentioned



                         val a = (0 until 10000 by 1).toSet      //smaller data
                      val b = (0 until 1000000 by 2).toSet //Relatively larger data


                      we can define a time function to check the execution time as below



                      def time[R](block: => R): R = {
                      val t0 = System.nanoTime()
                      val result = block // call-by-name
                      val t1 = System.nanoTime()
                      println("Elapsed time: " + (t1 - t0) + "ns")
                      result
                      }


                      Now we can check the intersection conditon



                      scala> time {a & b}
                      Elapsed time: 5895220ns
                      res2: scala.collection.immutable.Set[Int] = Set(892, 5810, 8062, ..)

                      scala> time {b & a}
                      Elapsed time: 6038271ns
                      res3: scala.collection.immutable.Set[Int] = Set(892, 5810, 8062, ...)


                      So by this we can conclude that intersection between a smaller and large dataset has performance difference and it is better to have the smaller dataset on the left side for faster execution for Scala set







                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Nov 16 at 10:34









                      prasanna kumar

                      798




                      798








                      • 2




                        The difference seem to be quite small given the huge size difference between the two sets. As the benchmark does not contain any warm-up and it performed from REPL, I would say it is inconclusive.
                        – Suma
                        Nov 16 at 11:20










                      • When I try this 100 Times I also get no conclusive answer.
                        – pme
                        Nov 16 at 11:23














                      • 2




                        The difference seem to be quite small given the huge size difference between the two sets. As the benchmark does not contain any warm-up and it performed from REPL, I would say it is inconclusive.
                        – Suma
                        Nov 16 at 11:20










                      • When I try this 100 Times I also get no conclusive answer.
                        – pme
                        Nov 16 at 11:23








                      2




                      2




                      The difference seem to be quite small given the huge size difference between the two sets. As the benchmark does not contain any warm-up and it performed from REPL, I would say it is inconclusive.
                      – Suma
                      Nov 16 at 11:20




                      The difference seem to be quite small given the huge size difference between the two sets. As the benchmark does not contain any warm-up and it performed from REPL, I would say it is inconclusive.
                      – Suma
                      Nov 16 at 11:20












                      When I try this 100 Times I also get no conclusive answer.
                      – pme
                      Nov 16 at 11:23




                      When I try this 100 Times I also get no conclusive answer.
                      – pme
                      Nov 16 at 11:23


















                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53329633%2fscala-set-intersection-performance%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      How to send String Array data to Server using php in android

                      Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents

                      Is anime1.com a legal site for watching anime?