Discussions
Categories
 382.6K All Categories
 2.1K Data
 213 Big Data Appliance
 1.9K Data Science
 448.4K Databases
 221.1K General Database Discussions
 3.7K Java and JavaScript in the Database
 25 Multilingual Engine
 541 MySQL Community Space
 469 NoSQL Database
 7.8K Oracle Database Express Edition (XE)
 2.9K ORDS, SODA & JSON in the Database
 499 SQLcl
 3.9K SQL Developer Data Modeler
 186.3K SQL & PL/SQL
 21.1K SQL Developer
 293.8K Development
 9 Developer Projects
 130 Programming Languages
 290.5K Development Tools
 96 DevOps
 3K QA/Testing
 645.6K Java
 24 Java Learning Subscription
 36.9K Database Connectivity
 151 Java Community Process
 104 Java 25
 22.1K Java APIs
 137.9K Java Development Tools
 165.3K Java EE (Java Enterprise Edition)
 17 Java Essentials
 146 Java 8 Questions
 85.9K Java Programming
 79 Java Puzzle Ball
 65.1K New To Java
 1.7K Training / Learning / Certification
 13.8K Java HotSpot Virtual Machine
 94.2K Java SE
 13.8K Java Security
 200 Java User Groups
 24 JavaScript  Nashorn
 Programs
 293 LiveLabs
 36 Workshops
 10.2K Software
 6.7K Berkeley DB Family
 3.5K JHeadstart
 5.8K Other Languages
 2.3K Chinese
 168 Deutsche Oracle Community
 1.2K Español
 1.9K Japanese
 234 Portuguese
Finding all possible combinations of numbers to reach a given sum
Hi,
I have a table with 66 payment records, but when I run a recursive SQL to find matching sum value(37027.73) from all combination of payment records, the query goes in a infinite loop and it does not come out. Can someone pls help on this.
Table Structure: Create table match_samples (ID NUMBER, PAYMENT NUMBER(14,2));
Table data scripts :
Insert into match_samples (ID,PAYMENT) values (1,2140.57);
Insert into match_samples (ID,PAYMENT) values (2,2140.57);
Insert into match_samples (ID,PAYMENT) values (3,2140.57);
Insert into match_samples (ID,PAYMENT) values (4,2140.57);
Insert into match_samples (ID,PAYMENT) values (5,2140.57);
Insert into match_samples (ID,PAYMENT) values (6,2140.57);
Insert into match_samples (ID,PAYMENT) values (7,2140.57);
Insert into match_samples (ID,PAYMENT) values (8,2140.57);
Insert into match_samples (ID,PAYMENT) values (9,4424.52);
Insert into match_samples (ID,PAYMENT) values (10,4424.52);
Insert into match_samples (ID,PAYMENT) values (11,4424.52);
Insert into match_samples (ID,PAYMENT) values (12,4424.52);
Insert into match_samples (ID,PAYMENT) values (13,4424.52);
Insert into match_samples (ID,PAYMENT) values (14,4424.52);
Insert into match_samples (ID,PAYMENT) values (15,4424.52);
Insert into match_samples (ID,PAYMENT) values (16,4424.52);
Insert into match_samples (ID,PAYMENT) values (17,4424.52);
Insert into match_samples (ID,PAYMENT) values (18,4424.52);
Insert into match_samples (ID,PAYMENT) values (19,3950.12);
Insert into match_samples (ID,PAYMENT) values (20,3950.12);
Insert into match_samples (ID,PAYMENT) values (21,3950.12);
Insert into match_samples (ID,PAYMENT) values (22,3950.12);
Insert into match_samples (ID,PAYMENT) values (23,3950.12);
Insert into match_samples (ID,PAYMENT) values (24,3950.12);
Insert into match_samples (ID,PAYMENT) values (25,3123.64);
Insert into match_samples (ID,PAYMENT) values (26,3123.64);
Insert into match_samples (ID,PAYMENT) values (27,3123.64);
Insert into match_samples (ID,PAYMENT) values (28,3123.64);
Insert into match_samples (ID,PAYMENT) values (29,3123.64);
Insert into match_samples (ID,PAYMENT) values (30,3123.64);
Insert into match_samples (ID,PAYMENT) values (31,3123.64);
Insert into match_samples (ID,PAYMENT) values (32,3123.64);
Insert into match_samples (ID,PAYMENT) values (33,3909.93);
Insert into match_samples (ID,PAYMENT) values (34,3909.93);
Insert into match_samples (ID,PAYMENT) values (35,3909.93);
Insert into match_samples (ID,PAYMENT) values (36,3909.93);
Insert into match_samples (ID,PAYMENT) values (37,3909.93);
Insert into match_samples (ID,PAYMENT) values (38,3909.93);
Insert into match_samples (ID,PAYMENT) values (39,3169.57);
Insert into match_samples (ID,PAYMENT) values (40,3169.57);
Insert into match_samples (ID,PAYMENT) values (41,3169.57);
Insert into match_samples (ID,PAYMENT) values (42,3169.57);
Insert into match_samples (ID,PAYMENT) values (43,3169.57);
Insert into match_samples (ID,PAYMENT) values (44,3169.57);
Insert into match_samples (ID,PAYMENT) values (45,3021.02);
Insert into match_samples (ID,PAYMENT) values (46,3021.02);
Insert into match_samples (ID,PAYMENT) values (47,3021.02);
Insert into match_samples (ID,PAYMENT) values (48,3021.02);
Insert into match_samples (ID,PAYMENT) values (49,3021.02);
Insert into match_samples (ID,PAYMENT) values (50,3021.02);
Insert into match_samples (ID,PAYMENT) values (51,3021.02);
Insert into match_samples (ID,PAYMENT) values (52,3021.02);
Insert into match_samples (ID,PAYMENT) values (53,10207.94);
Insert into match_samples (ID,PAYMENT) values (54,10207.94);
Insert into match_samples (ID,PAYMENT) values (55,10207.94);
Insert into match_samples (ID,PAYMENT) values (56,10207.94);
Insert into match_samples (ID,PAYMENT) values (57,10207.94);
Insert into match_samples (ID,PAYMENT) values (58,10207.94);
Insert into match_samples (ID,PAYMENT) values (59,3080.42);
Insert into match_samples (ID,PAYMENT) values (60,3080.42);
Insert into match_samples (ID,PAYMENT) values (61,3080.42);
Insert into match_samples (ID,PAYMENT) values (62,3080.42);
Insert into match_samples (ID,PAYMENT) values (63,3080.42);
Insert into match_samples (ID,PAYMENT) values (64,3080.42);
Insert into match_samples (ID,PAYMENT) values (65,3080.42);
Insert into match_samples (ID,PAYMENT) values (66,3080.42);
Recursive SQL getting stuck
with samples (i, n) as (
select id, payment from match_samples
),
iterator (i, n, s, o) as (
select i, n, n, cast(n as varchar2(4000))
from samples smp
union all
select smp.i
, smp.n
, itr.s + smp.n
, itr.o  '~'  to_char(smp.n)
from iterator itr
join samples smp on smp.i > itr.i
where itr.s + smp.n <= 37027.73
)
select *
from iterator
where s = 37027.73;
Answers

66 payment records
73786976294838206463 combinations

Obviously your bruteforce approach won't work; the problem is NPhard, so you will get to unfeasable sizes very quickly  even with the cutoff when sums go above your target, so the numbers won't be quite as high as shown in the previous reply.
The number of combinations in your particular case is significantly less than was shown above though, because you have many duplicate values / few distinct values, and you are only looking at combinations of values (payment amounts), not id's. But is that something that you are allowed to take advantage of in your reallife problem? The approach will be different from what you tried  but it's not worth looking at it if in real life you may need to solve the same problem, with 66 distinct payment amounts (or even, say, 30 distinct amounts, even if there are some duplicate values).

Hi Mathguy,
Thanks for your response. I very well get your point.
In real life scenario, there can be 30 distinct payment values. I would like to know the alternative approach and try it out with data?
Also, do we have any mechanism by which we can know before hand that this recursive query will generate many permutations and get stuck(any time bound restriction or something), so that I can skip the processing for this particular target and continue with the next one. If i get this, then I should be good with my requirement.
Thanks,
Sanjeeb

Sometimes it's easy to know if the problem is feasible (both for a "yes" answer, when the problem is sufficiently "small", and for a "no" answer when there is too much data), but sometimes it's harder.
Your recursive attempt essentially tests all POWER(2, 66) choices (each of the 66 amounts is either included in the sum or it isn't); that's about POWER(10, 20) (a number with 20 digits, as in User_[whatever]'s reply). If you are able to inspect even one trillion such combinations per second, you will still need many years to complete the task. Not doable.
The problem may in fact be "smaller" than it appears, if the target sum isn't very large; many of the tentative sums may go above the target quickly, so the number of combinations you must REALLY check is much smaller. But that is not going to be much help here; even if you only need to test 0.000001% of all possibilities, that is still a number that's too large to be doable in reasonable time.
If the inputs (the individual amounts and the total) are positive integers, an alternative is to use dynamic programming. If there are n individual amounts and the target amount is T, then the order of magnitude of the dynamic programming solution is O(nT). This may be much smaller than POWER(2, n) if T is reasonably small; but if T is of the order of POWER(2, n) itself (or greater), this won't help.
In your case, the inputs ARE integers, even though they don't look that way  just express them in "cents" rather than "dollars". This means that your target is not approx. 37,000 but rather approx. 3.7 million, but this is still overwhelmingly smaller than POWER(2, 66) ~ POWER(10, 20). So the dynamic approach has a chance. Alas, with n = 66 this means the size of the solution is over 200 million. Perhaps this is doable; I will give it a try, first in SQL, then in PL/SQL (it's possible that it would be faster), and if even that is too slow, I'll try it in C. Not doing this out of charitable leaning  I am interested in the problem too (and in implementing various solutions in different languages).
Another angle is that in some cases you may be satisfied with a partial answer. For example, this should be feasible: with the data you provided (or even with 66 distinct amounts), what are all the solutions that involve the sum of five or fewer terms? Whether this helps you at all in your real life problem, only you can now.
Finally, there are special cases  one example is the test data you provided (but not your reallife problem, based on your answer to my question). If the 66 values are in fact 11 distinct values, each repeated 6 times (not your data  I am simplifying), then there are in fact only POWER(7, 11) different combinations to try, and this is more manageable (about 2 billion combinations). This may or may not run in reasonable time, but at least it has a chance. I'll see if it will work for the data you provided; but the best hope is the dynamic programming approach, which is the only one that requires INTEGER inputs.

One more crucial point though: The dynamic programming solution can tell you if there is at least one solution, but it doesn't find ALL the solutions, assuming there's more than one. Even if this is not enough for your needs, it may still be something you would like to know (rather than not know ANYTHING AT ALL).

Thanks Mathguy for your detailed explanation. It makes sense. It is making me think to try out in python or java . One more question can you please let me know what is dynamic programming solution , do you mean dynamic sql ?

No, not at all. Dynamic programming has nothing to do with dynamic SQL  despite the similar name. In fact "dynamic programming" is a very general concept, not related to SQL  and really not related directly to programming either; it is rather part of "general algorithm theory" (if such a thing exists).
Here is a link to just one presentation of the DP (dynamic programming) solution as it applies to the SUBSET SUM problem:
You can also find a few implementations (in different languages). You will also find links to solving the problem using less space (although same time complexity), and a more complex algorithm that can also find the actual solutions, rather than just a yes/no answer.
For what it's worth, I implemented the DP solution in SQL, using the MODEL clause. It runs reasonably fast for small enough inputs (in my tests: seven sumands between 4 and 22; target sum = 150,000 runs in 3.6 seconds  obviously the answer is "not possible" but the algorithm still does all the work)  but when I changed the target sum to 1.5 million I stopped the run after a few hours. It's possible that the issue wasn't "execution time" but rather "memory limitations" since I implemented the simplest DP solution, which requires massive memory. Still working on solutions, for my own benefit (practicing programming and understanding the fine details of the algorithms).
For a different approach  using the "naive" recursion, but with a very significant optimization  see this older thread on this site:

The PL/SQL function whose code I include at the end implements the dynamic programming algorithm for SUBSET SUM, using O(
:target_sum
) space, assuming that the positive integer summands are saved in a tableT
with columnVAL
and the target sum (positive integer) is passed as argument to the function.(In fact, I ended up writing the code to return one solution, rather than just a "yes" or "no" answer; in general this will blow up the space requirement again. So take the claim of "linear" space with a grain of salt.)
For the data you provided for testing from the very beginning, I populated table T like this:
truncate table t; insert into t(val) select 100 * payment from match_samples ; commit;
Remember the trick: to make the inputs (and the target sum) integers, they must all be multiplied by 100. This is seen below too, where I show how I call the function (and the output):
select subset_sum(3702773) ss from dual; SS  214057+442452+395012+312364+390993+316957+302102+1020794+308042
On my machine, this runs in 80 to 85 seconds.
For your reallife application you may need to divide the summands by 100 to recover the original payment amounts. And you may need this rewritten to show which ID's contribute to the sum (in addition to, or perhaps instead of, the payment amounts).
Here's the function code (no error handling, no comments, no additional optimization):
create or replace function subset_sum(target_sum number) return varchar2 as type s_arr is table of number index by pls_integer; s s_arr; type memo_row is table of varchar2(4000) index by pls_integer; type memo is table of memo_row index by pls_integer; m memo; begin select val bulk collect into s from t; m(0)(0) := '0'; for i in 1 .. target_sum loop m(i)(0) := null; end loop; for j in 1 .. s.count loop for i in 0 .. target_sum loop m(i)(mod(j,2)) := coalesce( m(i)(mod(j1,2)), case when s(j) <= i and m(is(j))(mod(j1,2)) is not null then m(is(j))(mod(j1,2))  '+'  s(j) end ); end loop; end loop; return substr(m(target_sum)(mod(s.count,2)), 3); end; /

Hi Mathguy,
I could run your code and it gave me the result of a single row matching all payments with the target amount. But the requirement was to find out all possible combinations of payments matching the target amount.
Is this algo just matching with distinct values?

The function doesn't "consider" whether the given summands are distinct or if the same value appears more than once, and it has no "preference" regarding that.
The array
S
which collects the summands has them ordered whichever way they come from theSELECT
statement used to populate the array. In the code I wrote, I didn't specify an ordering; you may addORDER BY VAL
, or you may add an ordering column (corresponding to yourID
for example) and order by that column. Here's why ordering matters: The function as I wrote it will return the "earliest possible solution"  the solution (if there is at least one) that can be achieved with the fewest initial terms of the summands array. Moreover, if you remove the "last" summand in the solution (and subtract it from the target sum), the remaining sum again is the "earliest possible solution" for the reduced problem. Here "earliest" depends critically on the order in which the array is populated. (Choose a different ordering and, if the problem has more than one solution, you will likely find a different solution.)So, if the solution could use three of your first six summands (which are all equal), it WOULD use them. The fact that it doesn't means that any possible solution will use at most one of those six equal values. The same applies to the later summands  since your table has only nine distinct values, and the solution has nine distinct summands, this means that any possible solution will use at most one of the first eight values. (It is still possible that there are solutions that use the LAST value more than once  the reasoning above doesn't apply to the last value.)
To understand how the solution works, create your own (smaller) table T and call the function for various (smaller) values of target sum.
Regarding getting ALL the solutions: In principle that can be done, and it's not a lot harder than what I have already done. However, for many summands (like 66) and a large target sum (like 3.7 million), the question is not "reasonable". The space requirement becomes O(nT) again rather than O(T), where n is the number of summands and T is the target sum; and even finding one solution got my system close to its physical limits. (I actually had to increase the PGA aggregate limit before it would run, although I am not sure why  I don't see what is using so much memory.) In addition, even the number of distinct solutions may be so large as to be impractical to output them all  perhaps you could ask "how many distinct solutions there are", but not to actually see them all. What if there are, say, 300,000 distinct solutions  not to mention 300 million solutions?
Something to clarify too: when there are duplicate summands, when are solutions considered "distinct"? When they use different summands, or when they use different id's? To illustrate, suppose there are four ID's and all amounts are 100.00; the target sum is 200.00. There is only one solution if we only care about values (100.00 + 100.00); but there are six different combinations of ID's that can produce the target sum: any pair of ID's out of the four. In the extreme case, when there are 66 summands and they are all equal (and equal to 100.00, say), and the target sum is 3,300.00 (the singlevalue sum times 33), then there is only one distinct solution if you look at values only (33 terms all equal to 100.00), but if you look at combinations of ID's there are 66C33 ("66 choose 33") distinct combinations; that is about 7 * power(10, 18). Obviously one wouldn't be able to output ALL the solutions in any meaningful way  they wouldn't even be able to count them. When the summands are not all equal (and the target sum is not exactly 33 times one such summand), there will be fewer solutions, but still the number of solutions for a single target sum may in some cases be unreasonably large.
What I'm trying to say is that asking for ALL solutions is reasonable in some small cases, but not in the problem you posted.