Cfrgtkky

Question

Problem

Define a split of a multiset M to be an ordered pair of multisets of equal size whose union is M.

How can we count the splits of a multiset?

Discussion

This question stems from this one.

This problem is identical to counting the number of r-combinations of a multiset with 2r elements, for which a solution is shown in section 6.2 of Introductory Combinatorics by Richard A. Brualdi, Second Edition, 1992, Prentice-Hall, Inc. To see they are identical, observe there is a 1-1 correspondence between any r-combination X of a multiset M of 2r elements and the split (X, Y), where Y is all the elements of M that are not in X, that is, Y = M−X.

Brualdi’s solution using the inclusion-exclusion principle is well known to mathematicians and is suitable for algorithmic implementation.

In spite of the fact that it is well known to mathematicians, it was only partially answered here at Mathematics Stack Exchange and here at Quora.

Eric Postpischil 69.1k873150 · Answer 1 · 2018-11-12 23:49:03Z

Here is a solution. I have included explanation in the comments.

#include <inttypes.h>

#include <stddef.h>

#include <stdint.h>

#include <stdio.h>





typedef uintmax_t UInt;

#define UIntFormat PRIuMAX



#define NumberOf(a) (sizeof (a) / sizeof *(a))





/*  In this code, a multiset is represented with:



        an integer N0 which is the number of types of elements in the set,



        an integer N1 which is the number of types of elements in the set that

        each appear finitely many times (N0-N1 types each appear infinitely

        many times), and



        an array M in which each M[i] is the number of times that an element

        of type i appears.



    Collectively, this is referred to as the multiset M.

*/





/*  Return the number of ways to choose r things from n things.  This is

    n! / (r! * (n-r)!).

*/

static UInt Choose(UInt r, UInt n)

{

    UInt result = 1;

    for (UInt i = 1; i <= r; ++i)

        result = result * n-- / i;

    return result;

}





//  Count the number of r-combinations of a multiset.

static UInt CountRCombinations(UInt r, size_t N0, size_t N1, UInt M)

{

    /*  If we have only the unlimited types, there is a one-to-one

        correspondence between r objects with N0-1 dividers placed between

        them, each divider marking a transition from one type to another.  For

        example, consider four objects of three types.  Below "o" represents

        any object, and "|" is a divider.  For each arrangement of four o's and

        two |s, we show how it defines a selection of four objects of three

        types:



            oooo|| -> aaaa||

            ooo|o| -> aaa|b|

            ooo||o -> aaa||c

            oo|oo| -> aa|bb|

            oo|o|o -> aa|b|c

            oo||oo -> aa||cc

            o|ooo| -> a|bbb|

            o|oo|o -> a|bb|c

            o|o|oo -> a|b|cc

            o||ooo -> a||ccc

            |oooo| -> |bbbb|

            |ooo|o -> |bbb|c

            |oo|oo -> |bb|cc

            |o|ooo -> |b|ccc

            ||oooo -> ||cccc



        Therefore, the number of combinations equals the number of ways to

        arrange r indistinguishable objects of one type with N0-1

        indistinguishable objects of a different type.

    */

    if (N1 == 0)

        return Choose(r, r+N0-1);



    /*  Otherwise, we count the combinations:



            Select one of the limited types (we use the last one, N1-1, because

            it is easily removed from the array simply by reducing the size of

            the array).



            Count the number of combinations there would be if that type were

            unlimited.



            Count the number of combinations there would be if there were at

            least M[i]+1 instances of elements of that type.



            Subtract to get the number of combinations that have 0 to M[i]

            instances of elements of that type.

    */

    else

    {

        /*  Let M' be the multiset M with the last type changed to unlimited.



            So, where M is represented with N0, N1, M, M' is represented with

            N0, N1-1, M.

        */



        //  Change the last limited type to unlimited.

        N1 -= 1;



        //  Count the r-combinations drawn from M'.

        UInt C = CountRCombinations(r, N0, N1, M);



        /*  Now we count the combinations which have at least M[N1]+1 instances

            of the (formerly) last type.



            Consider that each such combination has M[N1]+1 instances of that

            type plus some combination of r - (M[N1]+1) elements drawn from M',

            including zero or more instances of the last type.  (Note that if r

            <= M[N1], there are no such combinations, since we would be asking

            for a negative number of elements.)



            So the number of combinations which have at least M[N1]+1 instances

            of the last type equals the number of combinations of that type

            plus some combination of r - (M[N1]+1) elements drawn from M'.

        */

        if (M[N1] < r)

            C -= CountRCombinations(r - (M[N1] + 1), N0, N1, M);



        return C;

    }

}





//  Count the number of splits of a multiset M that contains N types of things.

static UInt CountSplits(size_t N, UInt M)

{

    //  Count the number of elements.

    UInt T = 0;

    for (size_t i = 0; i < N; ++i)

        T += M[i];



    //  Return the number of T/2-combinations of M.

    return T % 2 ? 0 : CountRCombinations(T/2, N, N, M);

}





int main(void)

{

    UInt M = { 3, 4, 5 };

    size_t N = NumberOf(M);



    printf("The number of splits of {");

    for (size_t i = 0; i < N; ++i) printf(" %" UIntFormat, M[i]);

    printf(" } is %" UIntFormat ".n", CountSplits(N, M));

}

score 1 · Answer 2 · 2018-11-14 20:57:53Z

It's certainly the case that the original problem is isomorphic with the problem of counting r-combinations of a multiset of size 2r.

As an alternative to the inclusion-exclusion formula (which is certainly of analytic value, but possibly of less algorithmic value), we can construct a recursive solution which computes the counts of k-combinations of the set for all values of k, using a very similar approach to recursion employed in the recursive algorithm for counting the binomial coefficients C(n,k), the counts of k-combinations of a set of n elements.

Suppose we represent the multiset as a sorted vector V of size n (where n = 2r). (Technically, it doesn't have to be sorted; it is sufficient for it to be "clumped" so that all the identical elements are consecutive. However, the easiest way to do this is to sort the vector.) We want to produce all unique k-combinations of this vector. All such combinations have one of two forms:

"Choose the first element". The combination starts with V₁ and continues with a (k−1)-combination of (V₂, V₃, …, V_n).

"Skip the first element". The combination is a k-combination of (V_i, V_i+1, … V_n) where i is the smallest index such that V_i ≠ V₁. (We need to skip all elements identical to the first element in order to avoid duplication.)

The only difference here with the binomial recursion is the use of the index i in the second option; if the set has no repeated elements, this reduces to i=2, which yields the recursion C(n, k) = C(n − 1, k − 1) + C(n − 1, k).

A naive implementation of this recursion will take exponential time because each computation requires two recursive calls; however, there are only a quadratic number of unique calls so the computation can be reduced to quadratic time by memoisation or dynamic programming. The solution below uses dynamic programming, because that only requires linear space. (Memoisation would require quadratic space since there are a quadratic number of subproblems.)

The dynamic programming solution inverts the recursion by computing the number of k-combinations for the successive suffixes of the vector. It needs to keep the values for only two suffixes: the previous suffix and the previous suffix with a different first element, corresponding to the counts needed by the first and second option of the above recursion. (The actual code uses prefixes instead of suffixes, but it makes absolutely no difference.)

As a minor optimisation, we only compute the counts of k-combinations for values of k between 0 and ⌈n/2⌉. As with binomial coefficients, the counts are symmetrical: the number of k-combinations is equal to the number of (n−k)-combinations because every k-combination corresponds to a unique (n−k)-combination consisting of all the unselected elements. An additional optimisation is possible based on the fact that we only need one count at the end, but the additional complication only obscures the basic algorithm.

The fact that the solution is O(n²) is a little annoying, but since n will generally be a smallish number (otherwise the counts will be astronomical) the computation time seems reasonable. There are undoubtedly further optimisations possible, and it's quite possible that a sub-quadratic algorithm exists.

Here's a basic implementation in C (using an array of strings):

/* Given a *sorted* vector v of size n, compute the number of unique k-combinations

 * of the elements of the vector for values of k between 0 and (n/2)+1.

 * The counts are stored in the vector r, which must be large enough.

 * Counts for larger values of k can be trivially looked up in the returned

 * vector, using the identity r[k] == r[n - k].

 * If v is not sorted, the result will be incorrect. The function does not

 * check for overflow, but the results will be correct modulo (UINTMAX + 1)

 */

void multicomb(const char** v, size_t n, uintmax_t* r) {

  size_t lim = n / 2 + 1;

  // Prev retains the counts for the previous prefix ending with

  // a different element

  uintmax_t prev[lim];

  // If there are no elements, there is 1 0-combination and no other combinations.

  memset(r, 0, sizeof prev);

  r[0] = 1;

  // Iterate over the unique elements of v

  for (size_t k = 0; k < n; ) {

    // Save the counts for this prefix

    memcpy(prev, r, sizeof prev);

    // Iterate over the prefixes with the same ending value

    do {

      for (size_t i = lim - 1; i > 0; --i)

        r[i] = r[i - 1] + prev[i];

    } while (++k < n && strcmp(v[k - 1], v[k]) == 0);

  };

}

In comparison with the solution in OP's self-answer, this version:

overflows later because it only depends on addition. (The fact that there is no division also makes it much easier to compute the count modulo a prime.)

takes quadratic time instead of exponential time.

headbanger 406416 · Answer 3 · 2018-11-14 15:32:32Z

I don't think that that's quite right, Eric. The original question asks, if my source array contains A, A, B, B then how many ways can it be uniquely split between two recipients - the answer in this case will be three because the array could be split as follows:

Child Array 1 | Child Array 2

-----------------------------

A A           | B B

A B           | A B

B B           | A A

This source code (C++ function) works but it's horribly slow and inefficient. Brute force and ignorance.

int countPermutations(vector<int> itemset) {



  vector<vector<int> > permutations;



  do {

    vector<int> match;



    for(int i = 0; i < itemset.size(); i+=2) {

         match.push_back(itemset.at(i));

    }

    sort(match.begin(), match.end());



    int j = 0;

    bool found = false;

    while (!found && (j < permutations.size())) {

        found = (match == permutations.at(j))?true:false;

        j++;

    }

    if (!found) {

        permutations.push_back(match);

    }



  } while (next_permutation(itemset.begin(), itemset.end()));



  return permutations.size();

}

Any thoughts on optimisation?

Eric's algorithm produces the value 3 for this case, as expected (you have to call the function with the array {2, 2}, indicating "two As and two Bs"). Why do you think it's wrong? — Nov 14 at 18:30
My first thought on optimisation, by the way: Don't accumulate the possibilities; just count them. For any significant problem size, there will be far too many to keep in memory, and anyway you don't do anything with them. If you actually need the individual splits, that's a very different question (because counting can be optimised a lot more). — Nov 14 at 18:35
rici is correct, to compute the count for { A, A, B, B }, you need to change UInt M = { 3, 4, 5 }; to UInt M = { 2, 2 };. — Nov 14 at 20:11

Eric Postpischil 69.1k873150 · Answer 4 · 2018-11-12 23:49:03Z

Here is a solution. I have included explanation in the comments.

#include <inttypes.h>

#include <stddef.h>

#include <stdint.h>

#include <stdio.h>





typedef uintmax_t UInt;

#define UIntFormat PRIuMAX



#define NumberOf(a) (sizeof (a) / sizeof *(a))





/*  In this code, a multiset is represented with:



        an integer N0 which is the number of types of elements in the set,



        an integer N1 which is the number of types of elements in the set that

        each appear finitely many times (N0-N1 types each appear infinitely

        many times), and



        an array M in which each M[i] is the number of times that an element

        of type i appears.



    Collectively, this is referred to as the multiset M.

*/





/*  Return the number of ways to choose r things from n things.  This is

    n! / (r! * (n-r)!).

*/

static UInt Choose(UInt r, UInt n)

{

    UInt result = 1;

    for (UInt i = 1; i <= r; ++i)

        result = result * n-- / i;

    return result;

}





//  Count the number of r-combinations of a multiset.

static UInt CountRCombinations(UInt r, size_t N0, size_t N1, UInt M)

{

    /*  If we have only the unlimited types, there is a one-to-one

        correspondence between r objects with N0-1 dividers placed between

        them, each divider marking a transition from one type to another.  For

        example, consider four objects of three types.  Below "o" represents

        any object, and "|" is a divider.  For each arrangement of four o's and

        two |s, we show how it defines a selection of four objects of three

        types:



            oooo|| -> aaaa||

            ooo|o| -> aaa|b|

            ooo||o -> aaa||c

            oo|oo| -> aa|bb|

            oo|o|o -> aa|b|c

            oo||oo -> aa||cc

            o|ooo| -> a|bbb|

            o|oo|o -> a|bb|c

            o|o|oo -> a|b|cc

            o||ooo -> a||ccc

            |oooo| -> |bbbb|

            |ooo|o -> |bbb|c

            |oo|oo -> |bb|cc

            |o|ooo -> |b|ccc

            ||oooo -> ||cccc



        Therefore, the number of combinations equals the number of ways to

        arrange r indistinguishable objects of one type with N0-1

        indistinguishable objects of a different type.

    */

    if (N1 == 0)

        return Choose(r, r+N0-1);



    /*  Otherwise, we count the combinations:



            Select one of the limited types (we use the last one, N1-1, because

            it is easily removed from the array simply by reducing the size of

            the array).



            Count the number of combinations there would be if that type were

            unlimited.



            Count the number of combinations there would be if there were at

            least M[i]+1 instances of elements of that type.



            Subtract to get the number of combinations that have 0 to M[i]

            instances of elements of that type.

    */

    else

    {

        /*  Let M' be the multiset M with the last type changed to unlimited.



            So, where M is represented with N0, N1, M, M' is represented with

            N0, N1-1, M.

        */



        //  Change the last limited type to unlimited.

        N1 -= 1;



        //  Count the r-combinations drawn from M'.

        UInt C = CountRCombinations(r, N0, N1, M);



        /*  Now we count the combinations which have at least M[N1]+1 instances

            of the (formerly) last type.



            Consider that each such combination has M[N1]+1 instances of that

            type plus some combination of r - (M[N1]+1) elements drawn from M',

            including zero or more instances of the last type.  (Note that if r

            <= M[N1], there are no such combinations, since we would be asking

            for a negative number of elements.)



            So the number of combinations which have at least M[N1]+1 instances

            of the last type equals the number of combinations of that type

            plus some combination of r - (M[N1]+1) elements drawn from M'.

        */

        if (M[N1] < r)

            C -= CountRCombinations(r - (M[N1] + 1), N0, N1, M);



        return C;

    }

}





//  Count the number of splits of a multiset M that contains N types of things.

static UInt CountSplits(size_t N, UInt M)

{

    //  Count the number of elements.

    UInt T = 0;

    for (size_t i = 0; i < N; ++i)

        T += M[i];



    //  Return the number of T/2-combinations of M.

    return T % 2 ? 0 : CountRCombinations(T/2, N, N, M);

}





int main(void)

{

    UInt M = { 3, 4, 5 };

    size_t N = NumberOf(M);



    printf("The number of splits of {");

    for (size_t i = 0; i < N; ++i) printf(" %" UIntFormat, M[i]);

    printf(" } is %" UIntFormat ".n", CountSplits(N, M));

}

score 1 · Answer 5 · 2018-11-14 20:57:53Z

It's certainly the case that the original problem is isomorphic with the problem of counting r-combinations of a multiset of size 2r.

As an alternative to the inclusion-exclusion formula (which is certainly of analytic value, but possibly of less algorithmic value), we can construct a recursive solution which computes the counts of k-combinations of the set for all values of k, using a very similar approach to recursion employed in the recursive algorithm for counting the binomial coefficients C(n,k), the counts of k-combinations of a set of n elements.

Suppose we represent the multiset as a sorted vector V of size n (where n = 2r). (Technically, it doesn't have to be sorted; it is sufficient for it to be "clumped" so that all the identical elements are consecutive. However, the easiest way to do this is to sort the vector.) We want to produce all unique k-combinations of this vector. All such combinations have one of two forms:

"Choose the first element". The combination starts with V₁ and continues with a (k−1)-combination of (V₂, V₃, …, V_n).

"Skip the first element". The combination is a k-combination of (V_i, V_i+1, … V_n) where i is the smallest index such that V_i ≠ V₁. (We need to skip all elements identical to the first element in order to avoid duplication.)

The only difference here with the binomial recursion is the use of the index i in the second option; if the set has no repeated elements, this reduces to i=2, which yields the recursion C(n, k) = C(n − 1, k − 1) + C(n − 1, k).

A naive implementation of this recursion will take exponential time because each computation requires two recursive calls; however, there are only a quadratic number of unique calls so the computation can be reduced to quadratic time by memoisation or dynamic programming. The solution below uses dynamic programming, because that only requires linear space. (Memoisation would require quadratic space since there are a quadratic number of subproblems.)

The dynamic programming solution inverts the recursion by computing the number of k-combinations for the successive suffixes of the vector. It needs to keep the values for only two suffixes: the previous suffix and the previous suffix with a different first element, corresponding to the counts needed by the first and second option of the above recursion. (The actual code uses prefixes instead of suffixes, but it makes absolutely no difference.)

As a minor optimisation, we only compute the counts of k-combinations for values of k between 0 and ⌈n/2⌉. As with binomial coefficients, the counts are symmetrical: the number of k-combinations is equal to the number of (n−k)-combinations because every k-combination corresponds to a unique (n−k)-combination consisting of all the unselected elements. An additional optimisation is possible based on the fact that we only need one count at the end, but the additional complication only obscures the basic algorithm.

The fact that the solution is O(n²) is a little annoying, but since n will generally be a smallish number (otherwise the counts will be astronomical) the computation time seems reasonable. There are undoubtedly further optimisations possible, and it's quite possible that a sub-quadratic algorithm exists.

Here's a basic implementation in C (using an array of strings):

/* Given a *sorted* vector v of size n, compute the number of unique k-combinations

 * of the elements of the vector for values of k between 0 and (n/2)+1.

 * The counts are stored in the vector r, which must be large enough.

 * Counts for larger values of k can be trivially looked up in the returned

 * vector, using the identity r[k] == r[n - k].

 * If v is not sorted, the result will be incorrect. The function does not

 * check for overflow, but the results will be correct modulo (UINTMAX + 1)

 */

void multicomb(const char** v, size_t n, uintmax_t* r) {

  size_t lim = n / 2 + 1;

  // Prev retains the counts for the previous prefix ending with

  // a different element

  uintmax_t prev[lim];

  // If there are no elements, there is 1 0-combination and no other combinations.

  memset(r, 0, sizeof prev);

  r[0] = 1;

  // Iterate over the unique elements of v

  for (size_t k = 0; k < n; ) {

    // Save the counts for this prefix

    memcpy(prev, r, sizeof prev);

    // Iterate over the prefixes with the same ending value

    do {

      for (size_t i = lim - 1; i > 0; --i)

        r[i] = r[i - 1] + prev[i];

    } while (++k < n && strcmp(v[k - 1], v[k]) == 0);

  };

}

In comparison with the solution in OP's self-answer, this version:

overflows later because it only depends on addition. (The fact that there is no division also makes it much easier to compute the count modulo a prime.)

takes quadratic time instead of exponential time.

headbanger 406416 · Answer 6 · 2018-11-14 15:32:32Z

I don't think that that's quite right, Eric. The original question asks, if my source array contains A, A, B, B then how many ways can it be uniquely split between two recipients - the answer in this case will be three because the array could be split as follows:

Child Array 1 | Child Array 2

-----------------------------

A A           | B B

A B           | A B

B B           | A A

This source code (C++ function) works but it's horribly slow and inefficient. Brute force and ignorance.

int countPermutations(vector<int> itemset) {



  vector<vector<int> > permutations;



  do {

    vector<int> match;



    for(int i = 0; i < itemset.size(); i+=2) {

         match.push_back(itemset.at(i));

    }

    sort(match.begin(), match.end());



    int j = 0;

    bool found = false;

    while (!found && (j < permutations.size())) {

        found = (match == permutations.at(j))?true:false;

        j++;

    }

    if (!found) {

        permutations.push_back(match);

    }



  } while (next_permutation(itemset.begin(), itemset.end()));



  return permutations.size();

}

Any thoughts on optimisation?

Eric's algorithm produces the value 3 for this case, as expected (you have to call the function with the array {2, 2}, indicating "two As and two Bs"). Why do you think it's wrong? — Nov 14 at 18:30
My first thought on optimisation, by the way: Don't accumulate the possibilities; just count them. For any significant problem size, there will be far too many to keep in memory, and anyway you don't do anything with them. If you actually need the individual splits, that's a very different question (because counting can be optimised a lot more). — Nov 14 at 18:35
rici is correct, to compute the count for { A, A, B, B }, you need to change UInt M = { 3, 4, 5 }; to UInt M = { 2, 2 };. — Nov 14 at 20:11

搜尋此網誌

Cfrgtkky

How can we count the r-combinations of a multiset?

Problem

Discussion

Problem

Discussion

Problem

Discussion

Problem

Discussion

3 Answers
3

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

Post as a guest

Popular posts from this blog

Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

ComboBox Display Member on multiple fields

Is it possible to collect Nectar points via Trainline?

How can we count the r-combinations of a multiset?

Problem

Discussion

Problem

Discussion

Problem

Discussion

Problem

Discussion

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

ComboBox Display Member on multiple fields

Is it possible to collect Nectar points via Trainline?

3 Answers
3

3 Answers
3

3 Answers
3