How can we count the r-combinations of a multiset?











up vote
0
down vote

favorite












Problem



Define a split of a multiset M to be an ordered pair of multisets of equal size whose union is M.



How can we count the splits of a multiset?



Discussion



This question stems from this one.



This problem is identical to counting the number of r-combinations of a multiset with 2r elements, for which a solution is shown in section 6.2 of Introductory Combinatorics by Richard A. Brualdi, Second Edition, 1992, Prentice-Hall, Inc. To see they are identical, observe there is a 1-1 correspondence between any r-combination X of a multiset M of 2r elements and the split (X, Y), where Y is all the elements of M that are not in X, that is, Y = M−X.



Brualdi’s solution using the inclusion-exclusion principle is well known to mathematicians and is suitable for algorithmic implementation.



In spite of the fact that it is well known to mathematicians, it was only partially answered here at Mathematics Stack Exchange and here at Quora.










share|improve this question


























    up vote
    0
    down vote

    favorite












    Problem



    Define a split of a multiset M to be an ordered pair of multisets of equal size whose union is M.



    How can we count the splits of a multiset?



    Discussion



    This question stems from this one.



    This problem is identical to counting the number of r-combinations of a multiset with 2r elements, for which a solution is shown in section 6.2 of Introductory Combinatorics by Richard A. Brualdi, Second Edition, 1992, Prentice-Hall, Inc. To see they are identical, observe there is a 1-1 correspondence between any r-combination X of a multiset M of 2r elements and the split (X, Y), where Y is all the elements of M that are not in X, that is, Y = M−X.



    Brualdi’s solution using the inclusion-exclusion principle is well known to mathematicians and is suitable for algorithmic implementation.



    In spite of the fact that it is well known to mathematicians, it was only partially answered here at Mathematics Stack Exchange and here at Quora.










    share|improve this question
























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      Problem



      Define a split of a multiset M to be an ordered pair of multisets of equal size whose union is M.



      How can we count the splits of a multiset?



      Discussion



      This question stems from this one.



      This problem is identical to counting the number of r-combinations of a multiset with 2r elements, for which a solution is shown in section 6.2 of Introductory Combinatorics by Richard A. Brualdi, Second Edition, 1992, Prentice-Hall, Inc. To see they are identical, observe there is a 1-1 correspondence between any r-combination X of a multiset M of 2r elements and the split (X, Y), where Y is all the elements of M that are not in X, that is, Y = M−X.



      Brualdi’s solution using the inclusion-exclusion principle is well known to mathematicians and is suitable for algorithmic implementation.



      In spite of the fact that it is well known to mathematicians, it was only partially answered here at Mathematics Stack Exchange and here at Quora.










      share|improve this question













      Problem



      Define a split of a multiset M to be an ordered pair of multisets of equal size whose union is M.



      How can we count the splits of a multiset?



      Discussion



      This question stems from this one.



      This problem is identical to counting the number of r-combinations of a multiset with 2r elements, for which a solution is shown in section 6.2 of Introductory Combinatorics by Richard A. Brualdi, Second Edition, 1992, Prentice-Hall, Inc. To see they are identical, observe there is a 1-1 correspondence between any r-combination X of a multiset M of 2r elements and the split (X, Y), where Y is all the elements of M that are not in X, that is, Y = M−X.



      Brualdi’s solution using the inclusion-exclusion principle is well known to mathematicians and is suitable for algorithmic implementation.



      In spite of the fact that it is well known to mathematicians, it was only partially answered here at Mathematics Stack Exchange and here at Quora.







      c algorithm math






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 12 at 21:39









      Eric Postpischil

      69.1k873150




      69.1k873150
























          3 Answers
          3






          active

          oldest

          votes

















          up vote
          1
          down vote













          Here is a solution. I have included explanation in the comments.



          #include <inttypes.h>
          #include <stddef.h>
          #include <stdint.h>
          #include <stdio.h>


          typedef uintmax_t UInt;
          #define UIntFormat PRIuMAX

          #define NumberOf(a) (sizeof (a) / sizeof *(a))


          /* In this code, a multiset is represented with:

          an integer N0 which is the number of types of elements in the set,

          an integer N1 which is the number of types of elements in the set that
          each appear finitely many times (N0-N1 types each appear infinitely
          many times), and

          an array M in which each M[i] is the number of times that an element
          of type i appears.

          Collectively, this is referred to as the multiset M.
          */


          /* Return the number of ways to choose r things from n things. This is
          n! / (r! * (n-r)!).
          */
          static UInt Choose(UInt r, UInt n)
          {
          UInt result = 1;
          for (UInt i = 1; i <= r; ++i)
          result = result * n-- / i;
          return result;
          }


          // Count the number of r-combinations of a multiset.
          static UInt CountRCombinations(UInt r, size_t N0, size_t N1, UInt M)
          {
          /* If we have only the unlimited types, there is a one-to-one
          correspondence between r objects with N0-1 dividers placed between
          them, each divider marking a transition from one type to another. For
          example, consider four objects of three types. Below "o" represents
          any object, and "|" is a divider. For each arrangement of four o's and
          two |s, we show how it defines a selection of four objects of three
          types:

          oooo|| -> aaaa||
          ooo|o| -> aaa|b|
          ooo||o -> aaa||c
          oo|oo| -> aa|bb|
          oo|o|o -> aa|b|c
          oo||oo -> aa||cc
          o|ooo| -> a|bbb|
          o|oo|o -> a|bb|c
          o|o|oo -> a|b|cc
          o||ooo -> a||ccc
          |oooo| -> |bbbb|
          |ooo|o -> |bbb|c
          |oo|oo -> |bb|cc
          |o|ooo -> |b|ccc
          ||oooo -> ||cccc

          Therefore, the number of combinations equals the number of ways to
          arrange r indistinguishable objects of one type with N0-1
          indistinguishable objects of a different type.
          */
          if (N1 == 0)
          return Choose(r, r+N0-1);

          /* Otherwise, we count the combinations:

          Select one of the limited types (we use the last one, N1-1, because
          it is easily removed from the array simply by reducing the size of
          the array).

          Count the number of combinations there would be if that type were
          unlimited.

          Count the number of combinations there would be if there were at
          least M[i]+1 instances of elements of that type.

          Subtract to get the number of combinations that have 0 to M[i]
          instances of elements of that type.
          */
          else
          {
          /* Let M' be the multiset M with the last type changed to unlimited.

          So, where M is represented with N0, N1, M, M' is represented with
          N0, N1-1, M.
          */

          // Change the last limited type to unlimited.
          N1 -= 1;

          // Count the r-combinations drawn from M'.
          UInt C = CountRCombinations(r, N0, N1, M);

          /* Now we count the combinations which have at least M[N1]+1 instances
          of the (formerly) last type.

          Consider that each such combination has M[N1]+1 instances of that
          type plus some combination of r - (M[N1]+1) elements drawn from M',
          including zero or more instances of the last type. (Note that if r
          <= M[N1], there are no such combinations, since we would be asking
          for a negative number of elements.)

          So the number of combinations which have at least M[N1]+1 instances
          of the last type equals the number of combinations of that type
          plus some combination of r - (M[N1]+1) elements drawn from M'.
          */
          if (M[N1] < r)
          C -= CountRCombinations(r - (M[N1] + 1), N0, N1, M);

          return C;
          }
          }


          // Count the number of splits of a multiset M that contains N types of things.
          static UInt CountSplits(size_t N, UInt M)
          {
          // Count the number of elements.
          UInt T = 0;
          for (size_t i = 0; i < N; ++i)
          T += M[i];

          // Return the number of T/2-combinations of M.
          return T % 2 ? 0 : CountRCombinations(T/2, N, N, M);
          }


          int main(void)
          {
          UInt M = { 3, 4, 5 };
          size_t N = NumberOf(M);

          printf("The number of splits of {");
          for (size_t i = 0; i < N; ++i) printf(" %" UIntFormat, M[i]);
          printf(" } is %" UIntFormat ".n", CountSplits(N, M));
          }





          share|improve this answer




























            up vote
            1
            down vote













            It's certainly the case that the original problem is isomorphic with the problem of counting r-combinations of a multiset of size 2r.



            As an alternative to the inclusion-exclusion formula (which is certainly of analytic value, but possibly of less algorithmic value), we can construct a recursive solution which computes the counts of k-combinations of the set for all values of k, using a very similar approach to recursion employed in the recursive algorithm for counting the binomial coefficients C(n,k), the counts of k-combinations of a set of n elements.



            Suppose we represent the multiset as a sorted vector V of size n (where n &equals; 2r). (Technically, it doesn't have to be sorted; it is sufficient for it to be "clumped" so that all the identical elements are consecutive. However, the easiest way to do this is to sort the vector.) We want to produce all unique k-combinations of this vector. All such combinations have one of two forms:




            • "Choose the first element". The combination starts with V1 and continues with a (k−1)-combination of (V2, V3, …, Vn).


            • "Skip the first element". The combination is a k-combination of (Vi, Vi+1, … Vn) where i is the smallest index such that ViV1. (We need to skip all elements identical to the first element in order to avoid duplication.)



            The only difference here with the binomial recursion is the use of the index i in the second option; if the set has no repeated elements, this reduces to i&equals;2, which yields the recursion C(n, k) = C(n − 1, k − 1) + C(n − 1, k).



            A naive implementation of this recursion will take exponential time because each computation requires two recursive calls; however, there are only a quadratic number of unique calls so the computation can be reduced to quadratic time by memoisation or dynamic programming. The solution below uses dynamic programming, because that only requires linear space. (Memoisation would require quadratic space since there are a quadratic number of subproblems.)



            The dynamic programming solution inverts the recursion by computing the number of k-combinations for the successive suffixes of the vector. It needs to keep the values for only two suffixes: the previous suffix and the previous suffix with a different first element, corresponding to the counts needed by the first and second option of the above recursion. (The actual code uses prefixes instead of suffixes, but it makes absolutely no difference.)



            As a minor optimisation, we only compute the counts of k-combinations for values of k between 0 and ⌈n/2⌉. As with binomial coefficients, the counts are symmetrical: the number of k-combinations is equal to the number of (nk)-combinations because every k-combination corresponds to a unique (nk)-combination consisting of all the unselected elements. An additional optimisation is possible based on the fact that we only need one count at the end, but the additional complication only obscures the basic algorithm.



            The fact that the solution is O(n2) is a little annoying, but since n will generally be a smallish number (otherwise the counts will be astronomical) the computation time seems reasonable. There are undoubtedly further optimisations possible, and it's quite possible that a sub-quadratic algorithm exists.



            Here's a basic implementation in C (using an array of strings):



            /* Given a *sorted* vector v of size n, compute the number of unique k-combinations
            * of the elements of the vector for values of k between 0 and (n/2)+1.
            * The counts are stored in the vector r, which must be large enough.
            * Counts for larger values of k can be trivially looked up in the returned
            * vector, using the identity r[k] == r[n - k].
            * If v is not sorted, the result will be incorrect. The function does not
            * check for overflow, but the results will be correct modulo (UINTMAX + 1)
            */
            void multicomb(const char** v, size_t n, uintmax_t* r) {
            size_t lim = n / 2 + 1;
            // Prev retains the counts for the previous prefix ending with
            // a different element
            uintmax_t prev[lim];
            // If there are no elements, there is 1 0-combination and no other combinations.
            memset(r, 0, sizeof prev);
            r[0] = 1;
            // Iterate over the unique elements of v
            for (size_t k = 0; k < n; ) {
            // Save the counts for this prefix
            memcpy(prev, r, sizeof prev);
            // Iterate over the prefixes with the same ending value
            do {
            for (size_t i = lim - 1; i > 0; --i)
            r[i] = r[i - 1] + prev[i];
            } while (++k < n && strcmp(v[k - 1], v[k]) == 0);
            };
            }


            In comparison with the solution in OP's self-answer, this version:




            • overflows later because it only depends on addition. (The fact that there is no division also makes it much easier to compute the count modulo a prime.)

            • takes quadratic time instead of exponential time.






            share|improve this answer






























              up vote
              0
              down vote













              I don't think that that's quite right, Eric. The original question asks, if my source array contains A, A, B, B then how many ways can it be uniquely split between two recipients - the answer in this case will be three because the array could be split as follows:



              Child Array 1 | Child Array 2
              -----------------------------
              A A | B B
              A B | A B
              B B | A A


              This source code (C++ function) works but it's horribly slow and inefficient. Brute force and ignorance.



              int countPermutations(vector<int> itemset) {

              vector<vector<int> > permutations;

              do {
              vector<int> match;

              for(int i = 0; i < itemset.size(); i+=2) {
              match.push_back(itemset.at(i));
              }
              sort(match.begin(), match.end());

              int j = 0;
              bool found = false;
              while (!found && (j < permutations.size())) {
              found = (match == permutations.at(j))?true:false;
              j++;
              }
              if (!found) {
              permutations.push_back(match);
              }

              } while (next_permutation(itemset.begin(), itemset.end()));

              return permutations.size();
              }


              Any thoughts on optimisation?






              share|improve this answer

















              • 1




                Eric's algorithm produces the value 3 for this case, as expected (you have to call the function with the array {2, 2}, indicating "two As and two Bs"). Why do you think it's wrong?
                – rici
                Nov 14 at 18:30










              • My first thought on optimisation, by the way: Don't accumulate the possibilities; just count them. For any significant problem size, there will be far too many to keep in memory, and anyway you don't do anything with them. If you actually need the individual splits, that's a very different question (because counting can be optimised a lot more).
                – rici
                Nov 14 at 18:35










              • rici is correct, to compute the count for { A, A, B, B }, you need to change UInt M = { 3, 4, 5 }; to UInt M = { 2, 2 };.
                – Eric Postpischil
                Nov 14 at 20:11











              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














               

              draft saved


              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53270489%2fhow-can-we-count-the-r-combinations-of-a-multiset%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              1
              down vote













              Here is a solution. I have included explanation in the comments.



              #include <inttypes.h>
              #include <stddef.h>
              #include <stdint.h>
              #include <stdio.h>


              typedef uintmax_t UInt;
              #define UIntFormat PRIuMAX

              #define NumberOf(a) (sizeof (a) / sizeof *(a))


              /* In this code, a multiset is represented with:

              an integer N0 which is the number of types of elements in the set,

              an integer N1 which is the number of types of elements in the set that
              each appear finitely many times (N0-N1 types each appear infinitely
              many times), and

              an array M in which each M[i] is the number of times that an element
              of type i appears.

              Collectively, this is referred to as the multiset M.
              */


              /* Return the number of ways to choose r things from n things. This is
              n! / (r! * (n-r)!).
              */
              static UInt Choose(UInt r, UInt n)
              {
              UInt result = 1;
              for (UInt i = 1; i <= r; ++i)
              result = result * n-- / i;
              return result;
              }


              // Count the number of r-combinations of a multiset.
              static UInt CountRCombinations(UInt r, size_t N0, size_t N1, UInt M)
              {
              /* If we have only the unlimited types, there is a one-to-one
              correspondence between r objects with N0-1 dividers placed between
              them, each divider marking a transition from one type to another. For
              example, consider four objects of three types. Below "o" represents
              any object, and "|" is a divider. For each arrangement of four o's and
              two |s, we show how it defines a selection of four objects of three
              types:

              oooo|| -> aaaa||
              ooo|o| -> aaa|b|
              ooo||o -> aaa||c
              oo|oo| -> aa|bb|
              oo|o|o -> aa|b|c
              oo||oo -> aa||cc
              o|ooo| -> a|bbb|
              o|oo|o -> a|bb|c
              o|o|oo -> a|b|cc
              o||ooo -> a||ccc
              |oooo| -> |bbbb|
              |ooo|o -> |bbb|c
              |oo|oo -> |bb|cc
              |o|ooo -> |b|ccc
              ||oooo -> ||cccc

              Therefore, the number of combinations equals the number of ways to
              arrange r indistinguishable objects of one type with N0-1
              indistinguishable objects of a different type.
              */
              if (N1 == 0)
              return Choose(r, r+N0-1);

              /* Otherwise, we count the combinations:

              Select one of the limited types (we use the last one, N1-1, because
              it is easily removed from the array simply by reducing the size of
              the array).

              Count the number of combinations there would be if that type were
              unlimited.

              Count the number of combinations there would be if there were at
              least M[i]+1 instances of elements of that type.

              Subtract to get the number of combinations that have 0 to M[i]
              instances of elements of that type.
              */
              else
              {
              /* Let M' be the multiset M with the last type changed to unlimited.

              So, where M is represented with N0, N1, M, M' is represented with
              N0, N1-1, M.
              */

              // Change the last limited type to unlimited.
              N1 -= 1;

              // Count the r-combinations drawn from M'.
              UInt C = CountRCombinations(r, N0, N1, M);

              /* Now we count the combinations which have at least M[N1]+1 instances
              of the (formerly) last type.

              Consider that each such combination has M[N1]+1 instances of that
              type plus some combination of r - (M[N1]+1) elements drawn from M',
              including zero or more instances of the last type. (Note that if r
              <= M[N1], there are no such combinations, since we would be asking
              for a negative number of elements.)

              So the number of combinations which have at least M[N1]+1 instances
              of the last type equals the number of combinations of that type
              plus some combination of r - (M[N1]+1) elements drawn from M'.
              */
              if (M[N1] < r)
              C -= CountRCombinations(r - (M[N1] + 1), N0, N1, M);

              return C;
              }
              }


              // Count the number of splits of a multiset M that contains N types of things.
              static UInt CountSplits(size_t N, UInt M)
              {
              // Count the number of elements.
              UInt T = 0;
              for (size_t i = 0; i < N; ++i)
              T += M[i];

              // Return the number of T/2-combinations of M.
              return T % 2 ? 0 : CountRCombinations(T/2, N, N, M);
              }


              int main(void)
              {
              UInt M = { 3, 4, 5 };
              size_t N = NumberOf(M);

              printf("The number of splits of {");
              for (size_t i = 0; i < N; ++i) printf(" %" UIntFormat, M[i]);
              printf(" } is %" UIntFormat ".n", CountSplits(N, M));
              }





              share|improve this answer

























                up vote
                1
                down vote













                Here is a solution. I have included explanation in the comments.



                #include <inttypes.h>
                #include <stddef.h>
                #include <stdint.h>
                #include <stdio.h>


                typedef uintmax_t UInt;
                #define UIntFormat PRIuMAX

                #define NumberOf(a) (sizeof (a) / sizeof *(a))


                /* In this code, a multiset is represented with:

                an integer N0 which is the number of types of elements in the set,

                an integer N1 which is the number of types of elements in the set that
                each appear finitely many times (N0-N1 types each appear infinitely
                many times), and

                an array M in which each M[i] is the number of times that an element
                of type i appears.

                Collectively, this is referred to as the multiset M.
                */


                /* Return the number of ways to choose r things from n things. This is
                n! / (r! * (n-r)!).
                */
                static UInt Choose(UInt r, UInt n)
                {
                UInt result = 1;
                for (UInt i = 1; i <= r; ++i)
                result = result * n-- / i;
                return result;
                }


                // Count the number of r-combinations of a multiset.
                static UInt CountRCombinations(UInt r, size_t N0, size_t N1, UInt M)
                {
                /* If we have only the unlimited types, there is a one-to-one
                correspondence between r objects with N0-1 dividers placed between
                them, each divider marking a transition from one type to another. For
                example, consider four objects of three types. Below "o" represents
                any object, and "|" is a divider. For each arrangement of four o's and
                two |s, we show how it defines a selection of four objects of three
                types:

                oooo|| -> aaaa||
                ooo|o| -> aaa|b|
                ooo||o -> aaa||c
                oo|oo| -> aa|bb|
                oo|o|o -> aa|b|c
                oo||oo -> aa||cc
                o|ooo| -> a|bbb|
                o|oo|o -> a|bb|c
                o|o|oo -> a|b|cc
                o||ooo -> a||ccc
                |oooo| -> |bbbb|
                |ooo|o -> |bbb|c
                |oo|oo -> |bb|cc
                |o|ooo -> |b|ccc
                ||oooo -> ||cccc

                Therefore, the number of combinations equals the number of ways to
                arrange r indistinguishable objects of one type with N0-1
                indistinguishable objects of a different type.
                */
                if (N1 == 0)
                return Choose(r, r+N0-1);

                /* Otherwise, we count the combinations:

                Select one of the limited types (we use the last one, N1-1, because
                it is easily removed from the array simply by reducing the size of
                the array).

                Count the number of combinations there would be if that type were
                unlimited.

                Count the number of combinations there would be if there were at
                least M[i]+1 instances of elements of that type.

                Subtract to get the number of combinations that have 0 to M[i]
                instances of elements of that type.
                */
                else
                {
                /* Let M' be the multiset M with the last type changed to unlimited.

                So, where M is represented with N0, N1, M, M' is represented with
                N0, N1-1, M.
                */

                // Change the last limited type to unlimited.
                N1 -= 1;

                // Count the r-combinations drawn from M'.
                UInt C = CountRCombinations(r, N0, N1, M);

                /* Now we count the combinations which have at least M[N1]+1 instances
                of the (formerly) last type.

                Consider that each such combination has M[N1]+1 instances of that
                type plus some combination of r - (M[N1]+1) elements drawn from M',
                including zero or more instances of the last type. (Note that if r
                <= M[N1], there are no such combinations, since we would be asking
                for a negative number of elements.)

                So the number of combinations which have at least M[N1]+1 instances
                of the last type equals the number of combinations of that type
                plus some combination of r - (M[N1]+1) elements drawn from M'.
                */
                if (M[N1] < r)
                C -= CountRCombinations(r - (M[N1] + 1), N0, N1, M);

                return C;
                }
                }


                // Count the number of splits of a multiset M that contains N types of things.
                static UInt CountSplits(size_t N, UInt M)
                {
                // Count the number of elements.
                UInt T = 0;
                for (size_t i = 0; i < N; ++i)
                T += M[i];

                // Return the number of T/2-combinations of M.
                return T % 2 ? 0 : CountRCombinations(T/2, N, N, M);
                }


                int main(void)
                {
                UInt M = { 3, 4, 5 };
                size_t N = NumberOf(M);

                printf("The number of splits of {");
                for (size_t i = 0; i < N; ++i) printf(" %" UIntFormat, M[i]);
                printf(" } is %" UIntFormat ".n", CountSplits(N, M));
                }





                share|improve this answer























                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote









                  Here is a solution. I have included explanation in the comments.



                  #include <inttypes.h>
                  #include <stddef.h>
                  #include <stdint.h>
                  #include <stdio.h>


                  typedef uintmax_t UInt;
                  #define UIntFormat PRIuMAX

                  #define NumberOf(a) (sizeof (a) / sizeof *(a))


                  /* In this code, a multiset is represented with:

                  an integer N0 which is the number of types of elements in the set,

                  an integer N1 which is the number of types of elements in the set that
                  each appear finitely many times (N0-N1 types each appear infinitely
                  many times), and

                  an array M in which each M[i] is the number of times that an element
                  of type i appears.

                  Collectively, this is referred to as the multiset M.
                  */


                  /* Return the number of ways to choose r things from n things. This is
                  n! / (r! * (n-r)!).
                  */
                  static UInt Choose(UInt r, UInt n)
                  {
                  UInt result = 1;
                  for (UInt i = 1; i <= r; ++i)
                  result = result * n-- / i;
                  return result;
                  }


                  // Count the number of r-combinations of a multiset.
                  static UInt CountRCombinations(UInt r, size_t N0, size_t N1, UInt M)
                  {
                  /* If we have only the unlimited types, there is a one-to-one
                  correspondence between r objects with N0-1 dividers placed between
                  them, each divider marking a transition from one type to another. For
                  example, consider four objects of three types. Below "o" represents
                  any object, and "|" is a divider. For each arrangement of four o's and
                  two |s, we show how it defines a selection of four objects of three
                  types:

                  oooo|| -> aaaa||
                  ooo|o| -> aaa|b|
                  ooo||o -> aaa||c
                  oo|oo| -> aa|bb|
                  oo|o|o -> aa|b|c
                  oo||oo -> aa||cc
                  o|ooo| -> a|bbb|
                  o|oo|o -> a|bb|c
                  o|o|oo -> a|b|cc
                  o||ooo -> a||ccc
                  |oooo| -> |bbbb|
                  |ooo|o -> |bbb|c
                  |oo|oo -> |bb|cc
                  |o|ooo -> |b|ccc
                  ||oooo -> ||cccc

                  Therefore, the number of combinations equals the number of ways to
                  arrange r indistinguishable objects of one type with N0-1
                  indistinguishable objects of a different type.
                  */
                  if (N1 == 0)
                  return Choose(r, r+N0-1);

                  /* Otherwise, we count the combinations:

                  Select one of the limited types (we use the last one, N1-1, because
                  it is easily removed from the array simply by reducing the size of
                  the array).

                  Count the number of combinations there would be if that type were
                  unlimited.

                  Count the number of combinations there would be if there were at
                  least M[i]+1 instances of elements of that type.

                  Subtract to get the number of combinations that have 0 to M[i]
                  instances of elements of that type.
                  */
                  else
                  {
                  /* Let M' be the multiset M with the last type changed to unlimited.

                  So, where M is represented with N0, N1, M, M' is represented with
                  N0, N1-1, M.
                  */

                  // Change the last limited type to unlimited.
                  N1 -= 1;

                  // Count the r-combinations drawn from M'.
                  UInt C = CountRCombinations(r, N0, N1, M);

                  /* Now we count the combinations which have at least M[N1]+1 instances
                  of the (formerly) last type.

                  Consider that each such combination has M[N1]+1 instances of that
                  type plus some combination of r - (M[N1]+1) elements drawn from M',
                  including zero or more instances of the last type. (Note that if r
                  <= M[N1], there are no such combinations, since we would be asking
                  for a negative number of elements.)

                  So the number of combinations which have at least M[N1]+1 instances
                  of the last type equals the number of combinations of that type
                  plus some combination of r - (M[N1]+1) elements drawn from M'.
                  */
                  if (M[N1] < r)
                  C -= CountRCombinations(r - (M[N1] + 1), N0, N1, M);

                  return C;
                  }
                  }


                  // Count the number of splits of a multiset M that contains N types of things.
                  static UInt CountSplits(size_t N, UInt M)
                  {
                  // Count the number of elements.
                  UInt T = 0;
                  for (size_t i = 0; i < N; ++i)
                  T += M[i];

                  // Return the number of T/2-combinations of M.
                  return T % 2 ? 0 : CountRCombinations(T/2, N, N, M);
                  }


                  int main(void)
                  {
                  UInt M = { 3, 4, 5 };
                  size_t N = NumberOf(M);

                  printf("The number of splits of {");
                  for (size_t i = 0; i < N; ++i) printf(" %" UIntFormat, M[i]);
                  printf(" } is %" UIntFormat ".n", CountSplits(N, M));
                  }





                  share|improve this answer












                  Here is a solution. I have included explanation in the comments.



                  #include <inttypes.h>
                  #include <stddef.h>
                  #include <stdint.h>
                  #include <stdio.h>


                  typedef uintmax_t UInt;
                  #define UIntFormat PRIuMAX

                  #define NumberOf(a) (sizeof (a) / sizeof *(a))


                  /* In this code, a multiset is represented with:

                  an integer N0 which is the number of types of elements in the set,

                  an integer N1 which is the number of types of elements in the set that
                  each appear finitely many times (N0-N1 types each appear infinitely
                  many times), and

                  an array M in which each M[i] is the number of times that an element
                  of type i appears.

                  Collectively, this is referred to as the multiset M.
                  */


                  /* Return the number of ways to choose r things from n things. This is
                  n! / (r! * (n-r)!).
                  */
                  static UInt Choose(UInt r, UInt n)
                  {
                  UInt result = 1;
                  for (UInt i = 1; i <= r; ++i)
                  result = result * n-- / i;
                  return result;
                  }


                  // Count the number of r-combinations of a multiset.
                  static UInt CountRCombinations(UInt r, size_t N0, size_t N1, UInt M)
                  {
                  /* If we have only the unlimited types, there is a one-to-one
                  correspondence between r objects with N0-1 dividers placed between
                  them, each divider marking a transition from one type to another. For
                  example, consider four objects of three types. Below "o" represents
                  any object, and "|" is a divider. For each arrangement of four o's and
                  two |s, we show how it defines a selection of four objects of three
                  types:

                  oooo|| -> aaaa||
                  ooo|o| -> aaa|b|
                  ooo||o -> aaa||c
                  oo|oo| -> aa|bb|
                  oo|o|o -> aa|b|c
                  oo||oo -> aa||cc
                  o|ooo| -> a|bbb|
                  o|oo|o -> a|bb|c
                  o|o|oo -> a|b|cc
                  o||ooo -> a||ccc
                  |oooo| -> |bbbb|
                  |ooo|o -> |bbb|c
                  |oo|oo -> |bb|cc
                  |o|ooo -> |b|ccc
                  ||oooo -> ||cccc

                  Therefore, the number of combinations equals the number of ways to
                  arrange r indistinguishable objects of one type with N0-1
                  indistinguishable objects of a different type.
                  */
                  if (N1 == 0)
                  return Choose(r, r+N0-1);

                  /* Otherwise, we count the combinations:

                  Select one of the limited types (we use the last one, N1-1, because
                  it is easily removed from the array simply by reducing the size of
                  the array).

                  Count the number of combinations there would be if that type were
                  unlimited.

                  Count the number of combinations there would be if there were at
                  least M[i]+1 instances of elements of that type.

                  Subtract to get the number of combinations that have 0 to M[i]
                  instances of elements of that type.
                  */
                  else
                  {
                  /* Let M' be the multiset M with the last type changed to unlimited.

                  So, where M is represented with N0, N1, M, M' is represented with
                  N0, N1-1, M.
                  */

                  // Change the last limited type to unlimited.
                  N1 -= 1;

                  // Count the r-combinations drawn from M'.
                  UInt C = CountRCombinations(r, N0, N1, M);

                  /* Now we count the combinations which have at least M[N1]+1 instances
                  of the (formerly) last type.

                  Consider that each such combination has M[N1]+1 instances of that
                  type plus some combination of r - (M[N1]+1) elements drawn from M',
                  including zero or more instances of the last type. (Note that if r
                  <= M[N1], there are no such combinations, since we would be asking
                  for a negative number of elements.)

                  So the number of combinations which have at least M[N1]+1 instances
                  of the last type equals the number of combinations of that type
                  plus some combination of r - (M[N1]+1) elements drawn from M'.
                  */
                  if (M[N1] < r)
                  C -= CountRCombinations(r - (M[N1] + 1), N0, N1, M);

                  return C;
                  }
                  }


                  // Count the number of splits of a multiset M that contains N types of things.
                  static UInt CountSplits(size_t N, UInt M)
                  {
                  // Count the number of elements.
                  UInt T = 0;
                  for (size_t i = 0; i < N; ++i)
                  T += M[i];

                  // Return the number of T/2-combinations of M.
                  return T % 2 ? 0 : CountRCombinations(T/2, N, N, M);
                  }


                  int main(void)
                  {
                  UInt M = { 3, 4, 5 };
                  size_t N = NumberOf(M);

                  printf("The number of splits of {");
                  for (size_t i = 0; i < N; ++i) printf(" %" UIntFormat, M[i]);
                  printf(" } is %" UIntFormat ".n", CountSplits(N, M));
                  }






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 12 at 23:49









                  Eric Postpischil

                  69.1k873150




                  69.1k873150
























                      up vote
                      1
                      down vote













                      It's certainly the case that the original problem is isomorphic with the problem of counting r-combinations of a multiset of size 2r.



                      As an alternative to the inclusion-exclusion formula (which is certainly of analytic value, but possibly of less algorithmic value), we can construct a recursive solution which computes the counts of k-combinations of the set for all values of k, using a very similar approach to recursion employed in the recursive algorithm for counting the binomial coefficients C(n,k), the counts of k-combinations of a set of n elements.



                      Suppose we represent the multiset as a sorted vector V of size n (where n &equals; 2r). (Technically, it doesn't have to be sorted; it is sufficient for it to be "clumped" so that all the identical elements are consecutive. However, the easiest way to do this is to sort the vector.) We want to produce all unique k-combinations of this vector. All such combinations have one of two forms:




                      • "Choose the first element". The combination starts with V1 and continues with a (k−1)-combination of (V2, V3, …, Vn).


                      • "Skip the first element". The combination is a k-combination of (Vi, Vi+1, … Vn) where i is the smallest index such that ViV1. (We need to skip all elements identical to the first element in order to avoid duplication.)



                      The only difference here with the binomial recursion is the use of the index i in the second option; if the set has no repeated elements, this reduces to i&equals;2, which yields the recursion C(n, k) = C(n − 1, k − 1) + C(n − 1, k).



                      A naive implementation of this recursion will take exponential time because each computation requires two recursive calls; however, there are only a quadratic number of unique calls so the computation can be reduced to quadratic time by memoisation or dynamic programming. The solution below uses dynamic programming, because that only requires linear space. (Memoisation would require quadratic space since there are a quadratic number of subproblems.)



                      The dynamic programming solution inverts the recursion by computing the number of k-combinations for the successive suffixes of the vector. It needs to keep the values for only two suffixes: the previous suffix and the previous suffix with a different first element, corresponding to the counts needed by the first and second option of the above recursion. (The actual code uses prefixes instead of suffixes, but it makes absolutely no difference.)



                      As a minor optimisation, we only compute the counts of k-combinations for values of k between 0 and ⌈n/2⌉. As with binomial coefficients, the counts are symmetrical: the number of k-combinations is equal to the number of (nk)-combinations because every k-combination corresponds to a unique (nk)-combination consisting of all the unselected elements. An additional optimisation is possible based on the fact that we only need one count at the end, but the additional complication only obscures the basic algorithm.



                      The fact that the solution is O(n2) is a little annoying, but since n will generally be a smallish number (otherwise the counts will be astronomical) the computation time seems reasonable. There are undoubtedly further optimisations possible, and it's quite possible that a sub-quadratic algorithm exists.



                      Here's a basic implementation in C (using an array of strings):



                      /* Given a *sorted* vector v of size n, compute the number of unique k-combinations
                      * of the elements of the vector for values of k between 0 and (n/2)+1.
                      * The counts are stored in the vector r, which must be large enough.
                      * Counts for larger values of k can be trivially looked up in the returned
                      * vector, using the identity r[k] == r[n - k].
                      * If v is not sorted, the result will be incorrect. The function does not
                      * check for overflow, but the results will be correct modulo (UINTMAX + 1)
                      */
                      void multicomb(const char** v, size_t n, uintmax_t* r) {
                      size_t lim = n / 2 + 1;
                      // Prev retains the counts for the previous prefix ending with
                      // a different element
                      uintmax_t prev[lim];
                      // If there are no elements, there is 1 0-combination and no other combinations.
                      memset(r, 0, sizeof prev);
                      r[0] = 1;
                      // Iterate over the unique elements of v
                      for (size_t k = 0; k < n; ) {
                      // Save the counts for this prefix
                      memcpy(prev, r, sizeof prev);
                      // Iterate over the prefixes with the same ending value
                      do {
                      for (size_t i = lim - 1; i > 0; --i)
                      r[i] = r[i - 1] + prev[i];
                      } while (++k < n && strcmp(v[k - 1], v[k]) == 0);
                      };
                      }


                      In comparison with the solution in OP's self-answer, this version:




                      • overflows later because it only depends on addition. (The fact that there is no division also makes it much easier to compute the count modulo a prime.)

                      • takes quadratic time instead of exponential time.






                      share|improve this answer



























                        up vote
                        1
                        down vote













                        It's certainly the case that the original problem is isomorphic with the problem of counting r-combinations of a multiset of size 2r.



                        As an alternative to the inclusion-exclusion formula (which is certainly of analytic value, but possibly of less algorithmic value), we can construct a recursive solution which computes the counts of k-combinations of the set for all values of k, using a very similar approach to recursion employed in the recursive algorithm for counting the binomial coefficients C(n,k), the counts of k-combinations of a set of n elements.



                        Suppose we represent the multiset as a sorted vector V of size n (where n &equals; 2r). (Technically, it doesn't have to be sorted; it is sufficient for it to be "clumped" so that all the identical elements are consecutive. However, the easiest way to do this is to sort the vector.) We want to produce all unique k-combinations of this vector. All such combinations have one of two forms:




                        • "Choose the first element". The combination starts with V1 and continues with a (k−1)-combination of (V2, V3, …, Vn).


                        • "Skip the first element". The combination is a k-combination of (Vi, Vi+1, … Vn) where i is the smallest index such that ViV1. (We need to skip all elements identical to the first element in order to avoid duplication.)



                        The only difference here with the binomial recursion is the use of the index i in the second option; if the set has no repeated elements, this reduces to i&equals;2, which yields the recursion C(n, k) = C(n − 1, k − 1) + C(n − 1, k).



                        A naive implementation of this recursion will take exponential time because each computation requires two recursive calls; however, there are only a quadratic number of unique calls so the computation can be reduced to quadratic time by memoisation or dynamic programming. The solution below uses dynamic programming, because that only requires linear space. (Memoisation would require quadratic space since there are a quadratic number of subproblems.)



                        The dynamic programming solution inverts the recursion by computing the number of k-combinations for the successive suffixes of the vector. It needs to keep the values for only two suffixes: the previous suffix and the previous suffix with a different first element, corresponding to the counts needed by the first and second option of the above recursion. (The actual code uses prefixes instead of suffixes, but it makes absolutely no difference.)



                        As a minor optimisation, we only compute the counts of k-combinations for values of k between 0 and ⌈n/2⌉. As with binomial coefficients, the counts are symmetrical: the number of k-combinations is equal to the number of (nk)-combinations because every k-combination corresponds to a unique (nk)-combination consisting of all the unselected elements. An additional optimisation is possible based on the fact that we only need one count at the end, but the additional complication only obscures the basic algorithm.



                        The fact that the solution is O(n2) is a little annoying, but since n will generally be a smallish number (otherwise the counts will be astronomical) the computation time seems reasonable. There are undoubtedly further optimisations possible, and it's quite possible that a sub-quadratic algorithm exists.



                        Here's a basic implementation in C (using an array of strings):



                        /* Given a *sorted* vector v of size n, compute the number of unique k-combinations
                        * of the elements of the vector for values of k between 0 and (n/2)+1.
                        * The counts are stored in the vector r, which must be large enough.
                        * Counts for larger values of k can be trivially looked up in the returned
                        * vector, using the identity r[k] == r[n - k].
                        * If v is not sorted, the result will be incorrect. The function does not
                        * check for overflow, but the results will be correct modulo (UINTMAX + 1)
                        */
                        void multicomb(const char** v, size_t n, uintmax_t* r) {
                        size_t lim = n / 2 + 1;
                        // Prev retains the counts for the previous prefix ending with
                        // a different element
                        uintmax_t prev[lim];
                        // If there are no elements, there is 1 0-combination and no other combinations.
                        memset(r, 0, sizeof prev);
                        r[0] = 1;
                        // Iterate over the unique elements of v
                        for (size_t k = 0; k < n; ) {
                        // Save the counts for this prefix
                        memcpy(prev, r, sizeof prev);
                        // Iterate over the prefixes with the same ending value
                        do {
                        for (size_t i = lim - 1; i > 0; --i)
                        r[i] = r[i - 1] + prev[i];
                        } while (++k < n && strcmp(v[k - 1], v[k]) == 0);
                        };
                        }


                        In comparison with the solution in OP's self-answer, this version:




                        • overflows later because it only depends on addition. (The fact that there is no division also makes it much easier to compute the count modulo a prime.)

                        • takes quadratic time instead of exponential time.






                        share|improve this answer

























                          up vote
                          1
                          down vote










                          up vote
                          1
                          down vote









                          It's certainly the case that the original problem is isomorphic with the problem of counting r-combinations of a multiset of size 2r.



                          As an alternative to the inclusion-exclusion formula (which is certainly of analytic value, but possibly of less algorithmic value), we can construct a recursive solution which computes the counts of k-combinations of the set for all values of k, using a very similar approach to recursion employed in the recursive algorithm for counting the binomial coefficients C(n,k), the counts of k-combinations of a set of n elements.



                          Suppose we represent the multiset as a sorted vector V of size n (where n &equals; 2r). (Technically, it doesn't have to be sorted; it is sufficient for it to be "clumped" so that all the identical elements are consecutive. However, the easiest way to do this is to sort the vector.) We want to produce all unique k-combinations of this vector. All such combinations have one of two forms:




                          • "Choose the first element". The combination starts with V1 and continues with a (k−1)-combination of (V2, V3, …, Vn).


                          • "Skip the first element". The combination is a k-combination of (Vi, Vi+1, … Vn) where i is the smallest index such that ViV1. (We need to skip all elements identical to the first element in order to avoid duplication.)



                          The only difference here with the binomial recursion is the use of the index i in the second option; if the set has no repeated elements, this reduces to i&equals;2, which yields the recursion C(n, k) = C(n − 1, k − 1) + C(n − 1, k).



                          A naive implementation of this recursion will take exponential time because each computation requires two recursive calls; however, there are only a quadratic number of unique calls so the computation can be reduced to quadratic time by memoisation or dynamic programming. The solution below uses dynamic programming, because that only requires linear space. (Memoisation would require quadratic space since there are a quadratic number of subproblems.)



                          The dynamic programming solution inverts the recursion by computing the number of k-combinations for the successive suffixes of the vector. It needs to keep the values for only two suffixes: the previous suffix and the previous suffix with a different first element, corresponding to the counts needed by the first and second option of the above recursion. (The actual code uses prefixes instead of suffixes, but it makes absolutely no difference.)



                          As a minor optimisation, we only compute the counts of k-combinations for values of k between 0 and ⌈n/2⌉. As with binomial coefficients, the counts are symmetrical: the number of k-combinations is equal to the number of (nk)-combinations because every k-combination corresponds to a unique (nk)-combination consisting of all the unselected elements. An additional optimisation is possible based on the fact that we only need one count at the end, but the additional complication only obscures the basic algorithm.



                          The fact that the solution is O(n2) is a little annoying, but since n will generally be a smallish number (otherwise the counts will be astronomical) the computation time seems reasonable. There are undoubtedly further optimisations possible, and it's quite possible that a sub-quadratic algorithm exists.



                          Here's a basic implementation in C (using an array of strings):



                          /* Given a *sorted* vector v of size n, compute the number of unique k-combinations
                          * of the elements of the vector for values of k between 0 and (n/2)+1.
                          * The counts are stored in the vector r, which must be large enough.
                          * Counts for larger values of k can be trivially looked up in the returned
                          * vector, using the identity r[k] == r[n - k].
                          * If v is not sorted, the result will be incorrect. The function does not
                          * check for overflow, but the results will be correct modulo (UINTMAX + 1)
                          */
                          void multicomb(const char** v, size_t n, uintmax_t* r) {
                          size_t lim = n / 2 + 1;
                          // Prev retains the counts for the previous prefix ending with
                          // a different element
                          uintmax_t prev[lim];
                          // If there are no elements, there is 1 0-combination and no other combinations.
                          memset(r, 0, sizeof prev);
                          r[0] = 1;
                          // Iterate over the unique elements of v
                          for (size_t k = 0; k < n; ) {
                          // Save the counts for this prefix
                          memcpy(prev, r, sizeof prev);
                          // Iterate over the prefixes with the same ending value
                          do {
                          for (size_t i = lim - 1; i > 0; --i)
                          r[i] = r[i - 1] + prev[i];
                          } while (++k < n && strcmp(v[k - 1], v[k]) == 0);
                          };
                          }


                          In comparison with the solution in OP's self-answer, this version:




                          • overflows later because it only depends on addition. (The fact that there is no division also makes it much easier to compute the count modulo a prime.)

                          • takes quadratic time instead of exponential time.






                          share|improve this answer














                          It's certainly the case that the original problem is isomorphic with the problem of counting r-combinations of a multiset of size 2r.



                          As an alternative to the inclusion-exclusion formula (which is certainly of analytic value, but possibly of less algorithmic value), we can construct a recursive solution which computes the counts of k-combinations of the set for all values of k, using a very similar approach to recursion employed in the recursive algorithm for counting the binomial coefficients C(n,k), the counts of k-combinations of a set of n elements.



                          Suppose we represent the multiset as a sorted vector V of size n (where n &equals; 2r). (Technically, it doesn't have to be sorted; it is sufficient for it to be "clumped" so that all the identical elements are consecutive. However, the easiest way to do this is to sort the vector.) We want to produce all unique k-combinations of this vector. All such combinations have one of two forms:




                          • "Choose the first element". The combination starts with V1 and continues with a (k−1)-combination of (V2, V3, …, Vn).


                          • "Skip the first element". The combination is a k-combination of (Vi, Vi+1, … Vn) where i is the smallest index such that ViV1. (We need to skip all elements identical to the first element in order to avoid duplication.)



                          The only difference here with the binomial recursion is the use of the index i in the second option; if the set has no repeated elements, this reduces to i&equals;2, which yields the recursion C(n, k) = C(n − 1, k − 1) + C(n − 1, k).



                          A naive implementation of this recursion will take exponential time because each computation requires two recursive calls; however, there are only a quadratic number of unique calls so the computation can be reduced to quadratic time by memoisation or dynamic programming. The solution below uses dynamic programming, because that only requires linear space. (Memoisation would require quadratic space since there are a quadratic number of subproblems.)



                          The dynamic programming solution inverts the recursion by computing the number of k-combinations for the successive suffixes of the vector. It needs to keep the values for only two suffixes: the previous suffix and the previous suffix with a different first element, corresponding to the counts needed by the first and second option of the above recursion. (The actual code uses prefixes instead of suffixes, but it makes absolutely no difference.)



                          As a minor optimisation, we only compute the counts of k-combinations for values of k between 0 and ⌈n/2⌉. As with binomial coefficients, the counts are symmetrical: the number of k-combinations is equal to the number of (nk)-combinations because every k-combination corresponds to a unique (nk)-combination consisting of all the unselected elements. An additional optimisation is possible based on the fact that we only need one count at the end, but the additional complication only obscures the basic algorithm.



                          The fact that the solution is O(n2) is a little annoying, but since n will generally be a smallish number (otherwise the counts will be astronomical) the computation time seems reasonable. There are undoubtedly further optimisations possible, and it's quite possible that a sub-quadratic algorithm exists.



                          Here's a basic implementation in C (using an array of strings):



                          /* Given a *sorted* vector v of size n, compute the number of unique k-combinations
                          * of the elements of the vector for values of k between 0 and (n/2)+1.
                          * The counts are stored in the vector r, which must be large enough.
                          * Counts for larger values of k can be trivially looked up in the returned
                          * vector, using the identity r[k] == r[n - k].
                          * If v is not sorted, the result will be incorrect. The function does not
                          * check for overflow, but the results will be correct modulo (UINTMAX + 1)
                          */
                          void multicomb(const char** v, size_t n, uintmax_t* r) {
                          size_t lim = n / 2 + 1;
                          // Prev retains the counts for the previous prefix ending with
                          // a different element
                          uintmax_t prev[lim];
                          // If there are no elements, there is 1 0-combination and no other combinations.
                          memset(r, 0, sizeof prev);
                          r[0] = 1;
                          // Iterate over the unique elements of v
                          for (size_t k = 0; k < n; ) {
                          // Save the counts for this prefix
                          memcpy(prev, r, sizeof prev);
                          // Iterate over the prefixes with the same ending value
                          do {
                          for (size_t i = lim - 1; i > 0; --i)
                          r[i] = r[i - 1] + prev[i];
                          } while (++k < n && strcmp(v[k - 1], v[k]) == 0);
                          };
                          }


                          In comparison with the solution in OP's self-answer, this version:




                          • overflows later because it only depends on addition. (The fact that there is no division also makes it much easier to compute the count modulo a prime.)

                          • takes quadratic time instead of exponential time.







                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Nov 14 at 20:57

























                          answered Nov 14 at 19:31









                          rici

                          149k19128192




                          149k19128192






















                              up vote
                              0
                              down vote













                              I don't think that that's quite right, Eric. The original question asks, if my source array contains A, A, B, B then how many ways can it be uniquely split between two recipients - the answer in this case will be three because the array could be split as follows:



                              Child Array 1 | Child Array 2
                              -----------------------------
                              A A | B B
                              A B | A B
                              B B | A A


                              This source code (C++ function) works but it's horribly slow and inefficient. Brute force and ignorance.



                              int countPermutations(vector<int> itemset) {

                              vector<vector<int> > permutations;

                              do {
                              vector<int> match;

                              for(int i = 0; i < itemset.size(); i+=2) {
                              match.push_back(itemset.at(i));
                              }
                              sort(match.begin(), match.end());

                              int j = 0;
                              bool found = false;
                              while (!found && (j < permutations.size())) {
                              found = (match == permutations.at(j))?true:false;
                              j++;
                              }
                              if (!found) {
                              permutations.push_back(match);
                              }

                              } while (next_permutation(itemset.begin(), itemset.end()));

                              return permutations.size();
                              }


                              Any thoughts on optimisation?






                              share|improve this answer

















                              • 1




                                Eric's algorithm produces the value 3 for this case, as expected (you have to call the function with the array {2, 2}, indicating "two As and two Bs"). Why do you think it's wrong?
                                – rici
                                Nov 14 at 18:30










                              • My first thought on optimisation, by the way: Don't accumulate the possibilities; just count them. For any significant problem size, there will be far too many to keep in memory, and anyway you don't do anything with them. If you actually need the individual splits, that's a very different question (because counting can be optimised a lot more).
                                – rici
                                Nov 14 at 18:35










                              • rici is correct, to compute the count for { A, A, B, B }, you need to change UInt M = { 3, 4, 5 }; to UInt M = { 2, 2 };.
                                – Eric Postpischil
                                Nov 14 at 20:11















                              up vote
                              0
                              down vote













                              I don't think that that's quite right, Eric. The original question asks, if my source array contains A, A, B, B then how many ways can it be uniquely split between two recipients - the answer in this case will be three because the array could be split as follows:



                              Child Array 1 | Child Array 2
                              -----------------------------
                              A A | B B
                              A B | A B
                              B B | A A


                              This source code (C++ function) works but it's horribly slow and inefficient. Brute force and ignorance.



                              int countPermutations(vector<int> itemset) {

                              vector<vector<int> > permutations;

                              do {
                              vector<int> match;

                              for(int i = 0; i < itemset.size(); i+=2) {
                              match.push_back(itemset.at(i));
                              }
                              sort(match.begin(), match.end());

                              int j = 0;
                              bool found = false;
                              while (!found && (j < permutations.size())) {
                              found = (match == permutations.at(j))?true:false;
                              j++;
                              }
                              if (!found) {
                              permutations.push_back(match);
                              }

                              } while (next_permutation(itemset.begin(), itemset.end()));

                              return permutations.size();
                              }


                              Any thoughts on optimisation?






                              share|improve this answer

















                              • 1




                                Eric's algorithm produces the value 3 for this case, as expected (you have to call the function with the array {2, 2}, indicating "two As and two Bs"). Why do you think it's wrong?
                                – rici
                                Nov 14 at 18:30










                              • My first thought on optimisation, by the way: Don't accumulate the possibilities; just count them. For any significant problem size, there will be far too many to keep in memory, and anyway you don't do anything with them. If you actually need the individual splits, that's a very different question (because counting can be optimised a lot more).
                                – rici
                                Nov 14 at 18:35










                              • rici is correct, to compute the count for { A, A, B, B }, you need to change UInt M = { 3, 4, 5 }; to UInt M = { 2, 2 };.
                                – Eric Postpischil
                                Nov 14 at 20:11













                              up vote
                              0
                              down vote










                              up vote
                              0
                              down vote









                              I don't think that that's quite right, Eric. The original question asks, if my source array contains A, A, B, B then how many ways can it be uniquely split between two recipients - the answer in this case will be three because the array could be split as follows:



                              Child Array 1 | Child Array 2
                              -----------------------------
                              A A | B B
                              A B | A B
                              B B | A A


                              This source code (C++ function) works but it's horribly slow and inefficient. Brute force and ignorance.



                              int countPermutations(vector<int> itemset) {

                              vector<vector<int> > permutations;

                              do {
                              vector<int> match;

                              for(int i = 0; i < itemset.size(); i+=2) {
                              match.push_back(itemset.at(i));
                              }
                              sort(match.begin(), match.end());

                              int j = 0;
                              bool found = false;
                              while (!found && (j < permutations.size())) {
                              found = (match == permutations.at(j))?true:false;
                              j++;
                              }
                              if (!found) {
                              permutations.push_back(match);
                              }

                              } while (next_permutation(itemset.begin(), itemset.end()));

                              return permutations.size();
                              }


                              Any thoughts on optimisation?






                              share|improve this answer












                              I don't think that that's quite right, Eric. The original question asks, if my source array contains A, A, B, B then how many ways can it be uniquely split between two recipients - the answer in this case will be three because the array could be split as follows:



                              Child Array 1 | Child Array 2
                              -----------------------------
                              A A | B B
                              A B | A B
                              B B | A A


                              This source code (C++ function) works but it's horribly slow and inefficient. Brute force and ignorance.



                              int countPermutations(vector<int> itemset) {

                              vector<vector<int> > permutations;

                              do {
                              vector<int> match;

                              for(int i = 0; i < itemset.size(); i+=2) {
                              match.push_back(itemset.at(i));
                              }
                              sort(match.begin(), match.end());

                              int j = 0;
                              bool found = false;
                              while (!found && (j < permutations.size())) {
                              found = (match == permutations.at(j))?true:false;
                              j++;
                              }
                              if (!found) {
                              permutations.push_back(match);
                              }

                              } while (next_permutation(itemset.begin(), itemset.end()));

                              return permutations.size();
                              }


                              Any thoughts on optimisation?







                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Nov 14 at 15:32









                              headbanger

                              406416




                              406416








                              • 1




                                Eric's algorithm produces the value 3 for this case, as expected (you have to call the function with the array {2, 2}, indicating "two As and two Bs"). Why do you think it's wrong?
                                – rici
                                Nov 14 at 18:30










                              • My first thought on optimisation, by the way: Don't accumulate the possibilities; just count them. For any significant problem size, there will be far too many to keep in memory, and anyway you don't do anything with them. If you actually need the individual splits, that's a very different question (because counting can be optimised a lot more).
                                – rici
                                Nov 14 at 18:35










                              • rici is correct, to compute the count for { A, A, B, B }, you need to change UInt M = { 3, 4, 5 }; to UInt M = { 2, 2 };.
                                – Eric Postpischil
                                Nov 14 at 20:11














                              • 1




                                Eric's algorithm produces the value 3 for this case, as expected (you have to call the function with the array {2, 2}, indicating "two As and two Bs"). Why do you think it's wrong?
                                – rici
                                Nov 14 at 18:30










                              • My first thought on optimisation, by the way: Don't accumulate the possibilities; just count them. For any significant problem size, there will be far too many to keep in memory, and anyway you don't do anything with them. If you actually need the individual splits, that's a very different question (because counting can be optimised a lot more).
                                – rici
                                Nov 14 at 18:35










                              • rici is correct, to compute the count for { A, A, B, B }, you need to change UInt M = { 3, 4, 5 }; to UInt M = { 2, 2 };.
                                – Eric Postpischil
                                Nov 14 at 20:11








                              1




                              1




                              Eric's algorithm produces the value 3 for this case, as expected (you have to call the function with the array {2, 2}, indicating "two As and two Bs"). Why do you think it's wrong?
                              – rici
                              Nov 14 at 18:30




                              Eric's algorithm produces the value 3 for this case, as expected (you have to call the function with the array {2, 2}, indicating "two As and two Bs"). Why do you think it's wrong?
                              – rici
                              Nov 14 at 18:30












                              My first thought on optimisation, by the way: Don't accumulate the possibilities; just count them. For any significant problem size, there will be far too many to keep in memory, and anyway you don't do anything with them. If you actually need the individual splits, that's a very different question (because counting can be optimised a lot more).
                              – rici
                              Nov 14 at 18:35




                              My first thought on optimisation, by the way: Don't accumulate the possibilities; just count them. For any significant problem size, there will be far too many to keep in memory, and anyway you don't do anything with them. If you actually need the individual splits, that's a very different question (because counting can be optimised a lot more).
                              – rici
                              Nov 14 at 18:35












                              rici is correct, to compute the count for { A, A, B, B }, you need to change UInt M = { 3, 4, 5 }; to UInt M = { 2, 2 };.
                              – Eric Postpischil
                              Nov 14 at 20:11




                              rici is correct, to compute the count for { A, A, B, B }, you need to change UInt M = { 3, 4, 5 }; to UInt M = { 2, 2 };.
                              – Eric Postpischil
                              Nov 14 at 20:11


















                               

                              draft saved


                              draft discarded



















































                               


                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53270489%2fhow-can-we-count-the-r-combinations-of-a-multiset%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

                              ComboBox Display Member on multiple fields

                              Is it possible to collect Nectar points via Trainline?