Forum Stats

  • 3,769,814 Users
  • 2,253,026 Discussions
  • 7,875,214 Comments

Discussions

Finding all possible combinations of numbers to reach a given sum

User_KI8KP
User_KI8KP Member Posts: 11 Employee

Hi,

I have a table with 66 payment records, but when I run a recursive SQL to find matching sum value(37027.73) from all combination of payment records, the query goes in a infinite loop and it does not come out. Can someone pls help on this.

Table Structure: Create table match_samples (ID NUMBER, PAYMENT NUMBER(14,2));

Table data scripts : 

Insert into match_samples (ID,PAYMENT) values (1,2140.57);

Insert into match_samples (ID,PAYMENT) values (2,2140.57);

Insert into match_samples (ID,PAYMENT) values (3,2140.57);

Insert into match_samples (ID,PAYMENT) values (4,2140.57);

Insert into match_samples (ID,PAYMENT) values (5,2140.57);

Insert into match_samples (ID,PAYMENT) values (6,2140.57);

Insert into match_samples (ID,PAYMENT) values (7,2140.57);

Insert into match_samples (ID,PAYMENT) values (8,2140.57);

Insert into match_samples (ID,PAYMENT) values (9,4424.52);

Insert into match_samples (ID,PAYMENT) values (10,4424.52);

Insert into match_samples (ID,PAYMENT) values (11,4424.52);

Insert into match_samples (ID,PAYMENT) values (12,4424.52);

Insert into match_samples (ID,PAYMENT) values (13,4424.52);

Insert into match_samples (ID,PAYMENT) values (14,4424.52);

Insert into match_samples (ID,PAYMENT) values (15,4424.52);

Insert into match_samples (ID,PAYMENT) values (16,4424.52);

Insert into match_samples (ID,PAYMENT) values (17,4424.52);

Insert into match_samples (ID,PAYMENT) values (18,4424.52);

Insert into match_samples (ID,PAYMENT) values (19,3950.12);

Insert into match_samples (ID,PAYMENT) values (20,3950.12);

Insert into match_samples (ID,PAYMENT) values (21,3950.12);

Insert into match_samples (ID,PAYMENT) values (22,3950.12);

Insert into match_samples (ID,PAYMENT) values (23,3950.12);

Insert into match_samples (ID,PAYMENT) values (24,3950.12);

Insert into match_samples (ID,PAYMENT) values (25,3123.64);

Insert into match_samples (ID,PAYMENT) values (26,3123.64);

Insert into match_samples (ID,PAYMENT) values (27,3123.64);

Insert into match_samples (ID,PAYMENT) values (28,3123.64);

Insert into match_samples (ID,PAYMENT) values (29,3123.64);

Insert into match_samples (ID,PAYMENT) values (30,3123.64);

Insert into match_samples (ID,PAYMENT) values (31,3123.64);

Insert into match_samples (ID,PAYMENT) values (32,3123.64);

Insert into match_samples (ID,PAYMENT) values (33,3909.93);

Insert into match_samples (ID,PAYMENT) values (34,3909.93);

Insert into match_samples (ID,PAYMENT) values (35,3909.93);

Insert into match_samples (ID,PAYMENT) values (36,3909.93);

Insert into match_samples (ID,PAYMENT) values (37,3909.93);

Insert into match_samples (ID,PAYMENT) values (38,3909.93);

Insert into match_samples (ID,PAYMENT) values (39,3169.57);

Insert into match_samples (ID,PAYMENT) values (40,3169.57);

Insert into match_samples (ID,PAYMENT) values (41,3169.57);

Insert into match_samples (ID,PAYMENT) values (42,3169.57);

Insert into match_samples (ID,PAYMENT) values (43,3169.57);

Insert into match_samples (ID,PAYMENT) values (44,3169.57);

Insert into match_samples (ID,PAYMENT) values (45,3021.02);

Insert into match_samples (ID,PAYMENT) values (46,3021.02);

Insert into match_samples (ID,PAYMENT) values (47,3021.02);

Insert into match_samples (ID,PAYMENT) values (48,3021.02);

Insert into match_samples (ID,PAYMENT) values (49,3021.02);

Insert into match_samples (ID,PAYMENT) values (50,3021.02);

Insert into match_samples (ID,PAYMENT) values (51,3021.02);

Insert into match_samples (ID,PAYMENT) values (52,3021.02);

Insert into match_samples (ID,PAYMENT) values (53,10207.94);

Insert into match_samples (ID,PAYMENT) values (54,10207.94);

Insert into match_samples (ID,PAYMENT) values (55,10207.94);

Insert into match_samples (ID,PAYMENT) values (56,10207.94);

Insert into match_samples (ID,PAYMENT) values (57,10207.94);

Insert into match_samples (ID,PAYMENT) values (58,10207.94);

Insert into match_samples (ID,PAYMENT) values (59,3080.42);

Insert into match_samples (ID,PAYMENT) values (60,3080.42);

Insert into match_samples (ID,PAYMENT) values (61,3080.42);

Insert into match_samples (ID,PAYMENT) values (62,3080.42);

Insert into match_samples (ID,PAYMENT) values (63,3080.42);

Insert into match_samples (ID,PAYMENT) values (64,3080.42);

Insert into match_samples (ID,PAYMENT) values (65,3080.42);

Insert into match_samples (ID,PAYMENT) values (66,3080.42);

 

Recursive SQL getting stuck

with samples (i, n) as (

 select id, payment from match_samples

),

iterator (i, n, s, o) as (

 select i, n, n, cast(n as varchar2(4000))

 from samples smp

 union all

 select smp.i

   , smp.n

   , itr.s + smp.n

   , itr.o || '~' || to_char(smp.n)

 from iterator itr

 join samples smp on smp.i > itr.i

 where itr.s + smp.n <= 37027.73

)

select *

from iterator

where s = 37027.73;

«1

Answers

  • User_H3J7U
    User_H3J7U Member Posts: 684 Silver Trophy

    66 payment records

    73786976294838206463 combinations

  • mathguy
    mathguy Member Posts: 10,163 Blue Diamond

    Obviously your brute-force approach won't work; the problem is NP-hard, so you will get to unfeasable sizes very quickly - even with the cutoff when sums go above your target, so the numbers won't be quite as high as shown in the previous reply.

    The number of combinations in your particular case is significantly less than was shown above though, because you have many duplicate values / few distinct values, and you are only looking at combinations of values (payment amounts), not id's. But is that something that you are allowed to take advantage of in your real-life problem? The approach will be different from what you tried - but it's not worth looking at it if in real life you may need to solve the same problem, with 66 distinct payment amounts (or even, say, 30 distinct amounts, even if there are some duplicate values).

    User_KI8KP
  • User_KI8KP
    User_KI8KP Member Posts: 11 Employee

    Hi Mathguy,

    Thanks for your response. I very well get your point.

    In real life scenario, there can be 30 distinct payment values. I would like to know the alternative approach and try it out with data?

    Also, do we have any mechanism by which we can know before hand that this recursive query will generate many permutations and get stuck(any time bound restriction or something), so that I can skip the processing for this particular target and continue with the next one. If i get this, then I should be good with my requirement.

    Thanks,

    Sanjeeb

  • mathguy
    mathguy Member Posts: 10,163 Blue Diamond

    Sometimes it's easy to know if the problem is feasible (both for a "yes" answer, when the problem is sufficiently "small", and for a "no" answer when there is too much data), but sometimes it's harder.

    Your recursive attempt essentially tests all POWER(2, 66) choices (each of the 66 amounts is either included in the sum or it isn't); that's about POWER(10, 20) (a number with 20 digits, as in User_[whatever]'s reply). If you are able to inspect even one trillion such combinations per second, you will still need many years to complete the task. Not doable.

    The problem may in fact be "smaller" than it appears, if the target sum isn't very large; many of the tentative sums may go above the target quickly, so the number of combinations you must REALLY check is much smaller. But that is not going to be much help here; even if you only need to test 0.000001% of all possibilities, that is still a number that's too large to be do-able in reasonable time.

    If the inputs (the individual amounts and the total) are positive integers, an alternative is to use dynamic programming. If there are n individual amounts and the target amount is T, then the order of magnitude of the dynamic programming solution is O(nT). This may be much smaller than POWER(2, n) if T is reasonably small; but if T is of the order of POWER(2, n) itself (or greater), this won't help.

    In your case, the inputs ARE integers, even though they don't look that way - just express them in "cents" rather than "dollars". This means that your target is not approx. 37,000 but rather approx. 3.7 million, but this is still overwhelmingly smaller than POWER(2, 66) ~ POWER(10, 20). So the dynamic approach has a chance. Alas, with n = 66 this means the size of the solution is over 200 million. Perhaps this is doable; I will give it a try, first in SQL, then in PL/SQL (it's possible that it would be faster), and if even that is too slow, I'll try it in C. Not doing this out of charitable leaning - I am interested in the problem too (and in implementing various solutions in different languages).

    Another angle is that in some cases you may be satisfied with a partial answer. For example, this should be feasible: with the data you provided (or even with 66 distinct amounts), what are all the solutions that involve the sum of five or fewer terms? Whether this helps you at all in your real life problem, only you can now.

    Finally, there are special cases - one example is the test data you provided (but not your real-life problem, based on your answer to my question). If the 66 values are in fact 11 distinct values, each repeated 6 times (not your data - I am simplifying), then there are in fact only POWER(7, 11) different combinations to try, and this is more manageable (about 2 billion combinations). This may or may not run in reasonable time, but at least it has a chance. I'll see if it will work for the data you provided; but the best hope is the dynamic programming approach, which is the only one that requires INTEGER inputs.

  • mathguy
    mathguy Member Posts: 10,163 Blue Diamond

    One more crucial point though: The dynamic programming solution can tell you if there is at least one solution, but it doesn't find ALL the solutions, assuming there's more than one. Even if this is not enough for your needs, it may still be something you would like to know (rather than not know ANYTHING AT ALL).

  • User_KI8KP
    User_KI8KP Member Posts: 11 Employee

    Thanks Mathguy for your detailed explanation. It makes sense. It is making me think to try out in python or java . One more question can you please let me know what is dynamic programming solution , do you mean dynamic sql ?

  • mathguy
    mathguy Member Posts: 10,163 Blue Diamond

    No, not at all. Dynamic programming has nothing to do with dynamic SQL - despite the similar name. In fact "dynamic programming" is a very general concept, not related to SQL - and really not related directly to programming either; it is rather part of "general algorithm theory" (if such a thing exists).

    Here is a link to just one presentation of the DP (dynamic programming) solution as it applies to the SUBSET SUM problem:

    You can also find a few implementations (in different languages). You will also find links to solving the problem using less space (although same time complexity), and a more complex algorithm that can also find the actual solutions, rather than just a yes/no answer.

    For what it's worth, I implemented the DP solution in SQL, using the MODEL clause. It runs reasonably fast for small enough inputs (in my tests: seven sumands between 4 and 22; target sum = 150,000 runs in 3.6 seconds - obviously the answer is "not possible" but the algorithm still does all the work) - but when I changed the target sum to 1.5 million I stopped the run after a few hours. It's possible that the issue wasn't "execution time" but rather "memory limitations" since I implemented the simplest DP solution, which requires massive memory. Still working on solutions, for my own benefit (practicing programming and understanding the fine details of the algorithms).

    For a different approach - using the "naive" recursion, but with a very significant optimization - see this older thread on this site:


  • mathguy
    mathguy Member Posts: 10,163 Blue Diamond
    edited Oct 29, 2021 7:25PM

    The PL/SQL function whose code I include at the end implements the dynamic programming algorithm for SUBSET SUM, using O(:target_sum) space, assuming that the positive integer summands are saved in a table T with column VAL and the target sum (positive integer) is passed as argument to the function.

    (In fact, I ended up writing the code to return one solution, rather than just a "yes" or "no" answer; in general this will blow up the space requirement again. So take the claim of "linear" space with a grain of salt.)


    For the data you provided for testing from the very beginning, I populated table T like this:

    truncate table t;
    
    insert into t(val)
      select 100 * payment
      from   match_samples
    ;
    commit;
    

    Remember the trick: to make the inputs (and the target sum) integers, they must all be multiplied by 100. This is seen below too, where I show how I call the function (and the output):

    select subset_sum(3702773) ss
    from   dual;
    
    SS
    ---------------------------------------------------------------
    214057+442452+395012+312364+390993+316957+302102+1020794+308042
    

    On my machine, this runs in 80 to 85 seconds.

    For your real-life application you may need to divide the summands by 100 to recover the original payment amounts. And you may need this rewritten to show which ID's contribute to the sum (in addition to, or perhaps instead of, the payment amounts).

    Here's the function code (no error handling, no comments, no additional optimization):

    create or replace function subset_sum(target_sum number) return varchar2
    as
      type s_arr is table of number index by pls_integer;
      s    s_arr;
      type memo_row is table of varchar2(4000) index by pls_integer;
      type memo     is table of memo_row index by pls_integer;
      m    memo;
    begin
      select val bulk collect into s from  t;
      m(0)(0) := '0';
      for i in 1 .. target_sum loop
        m(i)(0) := null;
      end loop;
      for j in 1 .. s.count loop
        for i in 0 .. target_sum loop
          m(i)(mod(j,2)) :=
            coalesce( m(i)(mod(j-1,2)),
                      case when s(j) <= i and m(i-s(j))(mod(j-1,2)) is not null
                           then m(i-s(j))(mod(j-1,2)) || '+' || s(j) end
                    );
        end loop;
      end loop;
      return substr(m(target_sum)(mod(s.count,2)), 3);
    end;
    /
    
  • User_KI8KP
    User_KI8KP Member Posts: 11 Employee

    Hi Mathguy,

    I could run your code and it gave me the result of a single row matching all payments with the target amount. But the requirement was to find out all possible combinations of payments matching the target amount.

    Is this algo just matching with distinct values?

  • mathguy
    mathguy Member Posts: 10,163 Blue Diamond

    The function doesn't "consider" whether the given summands are distinct or if the same value appears more than once, and it has no "preference" regarding that.

    The array S which collects the summands has them ordered whichever way they come from the SELECT statement used to populate the array. In the code I wrote, I didn't specify an ordering; you may add ORDER BY VAL, or you may add an ordering column (corresponding to your ID for example) and order by that column. Here's why ordering matters: The function as I wrote it will return the "earliest possible solution" - the solution (if there is at least one) that can be achieved with the fewest initial terms of the summands array. Moreover, if you remove the "last" summand in the solution (and subtract it from the target sum), the remaining sum again is the "earliest possible solution" for the reduced problem. Here "earliest" depends critically on the order in which the array is populated. (Choose a different ordering and, if the problem has more than one solution, you will likely find a different solution.)

    So, if the solution could use three of your first six summands (which are all equal), it WOULD use them. The fact that it doesn't means that any possible solution will use at most one of those six equal values. The same applies to the later summands - since your table has only nine distinct values, and the solution has nine distinct summands, this means that any possible solution will use at most one of the first eight values. (It is still possible that there are solutions that use the LAST value more than once - the reasoning above doesn't apply to the last value.)

    To understand how the solution works, create your own (smaller) table T and call the function for various (smaller) values of target sum.

    Regarding getting ALL the solutions: In principle that can be done, and it's not a lot harder than what I have already done. However, for many summands (like 66) and a large target sum (like 3.7 million), the question is not "reasonable". The space requirement becomes O(nT) again rather than O(T), where n is the number of summands and T is the target sum; and even finding one solution got my system close to its physical limits. (I actually had to increase the PGA aggregate limit before it would run, although I am not sure why - I don't see what is using so much memory.) In addition, even the number of distinct solutions may be so large as to be impractical to output them all - perhaps you could ask "how many distinct solutions there are", but not to actually see them all. What if there are, say, 300,000 distinct solutions - not to mention 300 million solutions?

    Something to clarify too: when there are duplicate summands, when are solutions considered "distinct"? When they use different summands, or when they use different id's? To illustrate, suppose there are four ID's and all amounts are 100.00; the target sum is 200.00. There is only one solution if we only care about values (100.00 + 100.00); but there are six different combinations of ID's that can produce the target sum: any pair of ID's out of the four. In the extreme case, when there are 66 summands and they are all equal (and equal to 100.00, say), and the target sum is 3,300.00 (the single-value sum times 33), then there is only one distinct solution if you look at values only (33 terms all equal to 100.00), but if you look at combinations of ID's there are 66C33 ("66 choose 33") distinct combinations; that is about 7 * power(10, 18). Obviously one wouldn't be able to output ALL the solutions in any meaningful way - they wouldn't even be able to count them. When the summands are not all equal (and the target sum is not exactly 33 times one such summand), there will be fewer solutions, but still the number of solutions for a single target sum may in some cases be unreasonably large.

    What I'm trying to say is that asking for ALL solutions is reasonable in some small cases, but not in the problem you posted.