7 Replies Latest reply on Dec 16, 2018 5:44 PM by rp0428 Branched from an earlier discussion.

    Power function - different ways to compute powers

    mathguy

      While working on "Implementing Polynomials", and specifically in regards to the task of evaluating a polynomial at a given point (such as, at x = 1.0013), the issue of computing powers of numbers comes up naturally. In this branch I would like to discuss power functions specifically.

       

      If n is a positive integer and x is a real number, we all know what  x^n  means, and how to compute it. In pseudo-code, something like this:

       

      y := 1;

      for i in 1..n loop

        y := y * x;

      end loop;

      return y;

       

      Oracle offers the function POWER(x, n) for this task; and, as in higher math, Oracle does not restrict the function to n being an INTEGER; it may be any real number, but x must be strictly positive.

       

      So, what are some of the ways to compute powers, both for n integer and for n non-integer? How does Oracle implement it? (WHO KNOWS?) And are there any techniques to compute several different (integer) powers of the same number x in parallel, to save time, so that the evaluation of a POLYnomial can be optimized?

       

      One way to express (and perhaps compute) POWER(x, n) when x and n are real numbers with x > 0 is as  EXP( n * LN(x)).  EXP can be calculated pretty fast (its Taylor series converges very quickly). I don't know how computers calculate LN - most likely not from its Taylor series, which converges very slowly. (There are several faster ways, I just don't know what is written in the C libraries or whatever Oracle, say, uses for LN).

       

      For use with polynomials specifically, let's limit the discussion to n being a non-negative integer. Here is a way to compute powers (with n a non-negative integer) that are faster than the naive approach I showed at the top of this post.

       

      We want to compute x^n, given x and n. Decompose n as a sum of powers of 2. For example, say we must compute x^53. Write 53 = 32 + 16 +4 + 1. x^53 = x^32 * x^16 * x^4 * x^1. We can compute successively x^2, x^4, x^8, x^16, x^32 (by squaring the previous result at each step) - this requires five multiplications. Always <= LOG(2, n). And then we need three more multiplications to get x^53; this is one less than the number of set bits in the binary representation of n. In any case, also always <= LOG(2, n). So, this approach will perform no more than 2 * LOG(2, n) multiplications, instead of n-1 multiplications as in the naive approach.

       

      Here is a pseudo-code implementation:

       

      y := 1;

      z := x;

      k := n;

      while k > 1 loop

        if mod(k, 2) = 1 then

          y := y * z;

        end if;

        z := z * z;

        k := trunc(k/2);

      end loop;

      y := y * z;

      return y;

       

      This should be even better in something like C: the exponent n is already a binary integer; at each step we check whether the right-most bit is set, and then we shift one bit to the right. No need for expensive function calls like MOD(k, 2) and TRUNC(k/2).

       

      (I assume all of this is very well known to anyone with formal IT training; I am certain I am reinventing the wheel here...)

       

      There is no such simplification if n is NOT an integer, by the way. Curiously, though, whatever Oracle uses, my tests show that there is no difference in execution time between POWER(1.000001, 800000) and POWER(1.000001, 800000.25). I found this very surprising; I expected the former to go much faster than the latter. I should be able to test the same in C to see if my expectation was wrong...

       

      In any case, back to what's available in PL/SQL, and polynomial evaluation. The idea would be to consider the binary representation of ALL exponents with non-zero coefficient in a polynomial, and to compute all the needed powers of x at the same time. I'll try my hand at implementing polynomials as nested tables of monomials (or perhaps as associative arrays - I'll see if one may work better than the other) and apply this idea to evaluating a polynomial at a point, but I thought I'd put this out both for critique and to encourage others to work on it as well, if interested.

        • 1. Re: Power function - different ways to compute powers

          Oracle offers the function POWER(x, n) for this task; and, as in higher math, Oracle does not restrict the function to n being an INTEGER; it may be any real number, but x must be strictly positive.

          Please clarify to conform to the doc. I think you meant your last clause to say 'but if n is a real number then x must be strictly positive'.

          https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sqlrf/POWER.html#GUID-D280B322-D2C3-46D0-8076-C88F16CBED…

          POWER returns n2 raised to the n1 power. The base n2 and the exponent n1 can be any numbers, but if n2 is negative, then n1 must be an integer.

          • 2. Re: Power function - different ways to compute powers
            mathguy

            Yes - what I meant was "n may be any real number, but IF IT IS NOT AN INTEGER THEN x must be strictly positive". Exactly as the documentation says.

            • 3. Re: Power function - different ways to compute powers

              And are there any techniques to compute several different (integer) powers of the same number x in parallel, to save time, so that the evaluation of a POLYnomial can be optimized?

              Doesn't the 'Product Rule' allow you to break a single exponent into whatever number of 'pieces' you want as long as the sum of those pieces is equal to the original exponent?

              Algebra Basics - Exponents - In Depth

              Product Rule

              The exponent "product rule" tells us that, when multiplying two powers that have the same base, you can add the exponents. In this example, you can see how it works. Adding the exponents is just a short cut!

              What would be involved to try to determine an optimum set of exponents for a polynomial. Then each  unique sub-exponent would only have to be computed once?

               

              So if you needed n^7, n^13, n^23 you could compute n^7 and then use it for the other two:

               

              n^13 = n^7 x n^6                    ------ 7 + 6 = 13

              n^23 = n^7 x n^7 x n^7 x n^2   ------ 7 + 7 + 7 + 2 = 23

               

              Not sure I got the above example right but I'm sure you can follow what I was trying to show.

               

              The hypothesis would be that you could reduce the number of multiplications rather significantly and gain a performance boost from that.

               

              As a mathematician you might know whether there is a common set of powers that might be optimal for summing to any given value than an exponent might have.

               

              Perhaps that is a set of prime number exponents. So maybe use 2, 3, 5, 7, 11, 13, etc

               

              Let's assume you choose to always use those 6. Add another attribute and in your custom constructor (that you were trying to avoid) when the exponent array is stored pre-calculate those six powers and store them.

               

              Then you can use the pre-calculated values when you compute the main power value.

              • 4. Re: Power function - different ways to compute powers
                mathguy

                The "product rule" is what's behind the computation based on the binary representation of n in my initial post in this branched thread.

                 

                The other question, which you discuss in your reply, is which set of powers should be computed so that they can then be combined with each other to compute other powers. For that question, it is easy to show by induction on n that the optimal choice (and unique with the optimality property) is to use powers of 2. The problem is really about decomposing n as a SUM of smaller numbers - precisely the point of the "power rule"; so prime numbers (prime exponents) really have nothing to do with it.

                • 5. Re: Power function - different ways to compute powers

                  The other question, which you discuss in your reply, is which set of powers should be computed so that they can then be combined with each other to compute other powers.

                  My take on that is that you can use ANY set of powers you want as long as they add up to the power you are trying to compute.

                  For that question, it is easy to show by induction on n that the optimal choice (and unique with the optimality property) is to use powers of 2.

                  I don't follow that reasoning at all. Possibly if you were using only binary representations of values but I don't see where you indicate that you are willing to restrict yourself to that.

                   

                  As my example showed any set will work. Let's assume you store the powers in ascending order. Then if the lowest power is 5 (or 23, 37 or anything greater than 2O) why would you want to start with 2 and multiples of two?

                   

                  If you have 5, 23, 37 then:

                   

                  1. compute 5

                  2. compute 23 by using 5 four times and 3 one time

                  3. compute 37 by using 23 once, 5 twice and 4 one time

                   

                  For that case you only need to precompute 3, 4 and 5. The value for 37 can then also use the value computed for 23.

                  The problem is really about decomposing n as a SUM of smaller numbers - precisely the point of the "power rule"; so prime numbers (prime exponents) really have nothing to do with it.

                  What I was alluding to (but not necessarily arguing for) with the prime thing was related to being able to break down any positive integer into its prime components. That would likely only be useful if you had very large numbers of values that needed the same power - I'm surmising that, since any subset of powers would work, that the determination of the prime components would be more work and trouble than it would be worth.

                   

                  It's almost like determining the 'best' subset of powers for a given polynomial is somewhat analogous to a 'bin packing' problem of its own.

                   

                  My first inclination would be to try what my example above shows: use the smallest exact power that you need and then base the other powers on a multiple of the first power and the MOD remainder.

                   

                  My next inclination would be to consider limiting the number of times a smaller power is used.

                   

                  So for 5, 23, 37 rather than use the value for power 5 four times take an alternative route:

                   

                  1. compute power 23 manually and then use it and the power of 5 to compute 37

                   

                  2. pick a larger power like 8, 12 or something and use it for the larger values.

                   

                  Method 2 suggests doing some tests to see if there is an 'inflection point' somewhere where it is easier to compute certain multiples over others. The more values in the breakdown the more multiplications and sums there are. But the larger the values you the 'power subset' the more multiplications there are.

                   

                  If I were researching and analyzing this I would probably try to see if some standard statistical analysis can shed any light (average, mean, mode) on what might be a generically useful subset of powers.

                   

                  Hey - it's beyond me but you're a mathematician. Can't x^n and y^p each be broken down to use some common combinations like z^q? Assuming that is possible (maybe using logs) my sense is it is getting pretty far afield.

                   

                  I find that simpler is better - at least for the first prototype. That establishes a baseline against which others can be tested.

                  • 6. Re: Power function - different ways to compute powers
                    mathguy

                    I think you are mixing up two things. It is true that 65^n = 5^n * 13^n - when we consider x^n, if x is an integer and we have its prime factorization, and we know the powers of its prime factors, then we can easily compute x^n. But we are not interested in such things; we evaluate polynomials at real numbers, not at integers. We need 1.0013^200 and there is no prime factorization of 1.0013.

                     

                    It makes no sense to store any pre-calculated powers, since we don't know what x will be when we speak of x^n, or P(x) for a polynomial P. In an iterative algorithm like Newton's method the value of x changes at each step, and it is not known before the previous step is completed.

                     

                    My question was quite different. Given just x, how quickly can we compute, say, x^53? This will be done at runtime, but the question is, how do we set up the computation so that we need as few multiplications as possible? You will say, perhaps, that I need x^7 and x^23; then x^53 = x^7 * x*23 * x^23 so I only need three multiplications. But that's the wrong answer; how do we know what x^7 and x^23 in the first place? They must be calculated too - what is the fastest way for THAT?

                     

                    This is where the binary representation of the exponent n is relevant. 53 = 32 + 16 + 4 + 1. We compute x^2, x^4 = x^2 * x^2  (see, here we get x^4 from a value that we already calculated earlier, x^2), then x^8, x^16 and x^32. So far we needed five multiplications. And we need three more for x^53 = x^32 * x^16 * x^4 * x. So we need a total of 8 multiplications. And this is the least possible number of multiplications to get x^53. The crucial condition here, though, which we must always observe, is that at each step we must multiply values that were already calculated earlier in the process.

                     

                    Now, the minimum for exponent 53 is 8. For any given exponent, there is always a smallest (minimum) number of multiplications that will be needed. The minimum can always be achieved by using this "binary representation" approach. (This is not obvious on its face, perhaps - it is a theorem, with an easy proof by induction.)

                     

                    For exponent 53, we DON'T need to use the full binary representation. For example, 53 is also 16 + 16 + 16 + 4 + 1, so we can compute the powers up to x^16 instead of x^32 (so we save the last multiplication), but then in the last step we will need four multiplications instead of three. So the total number of multiplications will still be 8. So in that sense, there are ways other than the binary representation of n that will result in the same minimum number of multiplications. But the point is that the approach that will give the minimum number of multiplications for ALL n (not just for a fixed one like 53) is, in fact, unique - it requires the exponents that are successive powers of 2. It's the same as the well-known problem, "if we wanted coin denominations that will allow us to be able to make up any amount, and use the fewest denominations possible, what should those denominations be?" - the answer is "1 cent, 2 cents, 4 cents, 8 cents, ......"

                     

                    And we care about the approach that will give the fewest multiplications for many values of n simultaneously, because we want to use this to evaluate a polynomial, not just a single, isolated monomial. There may be faster ways for (very) sparse polynomials, for example something like 3 * x^53 - 2 * x + 1  -  but perhaps not for the general case.

                     

                    This is really the topic of this branch - what are other methods of computing powers of a given real number x, for positive integer exponents, where we may need to evaluate many different powers (with different exponents)?

                     

                    Here is a completely different approach, to which I alluded in the initial post in this branch:

                     

                    x^n = exp ( ln (x^n) ) = exp ( n ln(x) )

                     

                    The exponential function can be evaluated very quickly, because its Taylor series converges very rapidly, even for large values of the argument. This approach is not very promising because ln(x) is hard to evaluate.

                     

                    But now let's say we want to minimize the time to compute x^n for a fixed x but many values of n. Now the approach becomes much more promising: we only need to compute ln(x) once, then we must multiply it by various n (so there is only ONE multiplication for each n), and then we must exponentiate the results. EXP is much more expensive than a single multiplication, but it may be less expensive than the MANY multiplications needed to compute x^n for each n.

                    • 7. Re: Power function - different ways to compute powers

                      I think you are mixing up two things.

                      That wasn't my intention. With the prime thing I was asking about it 'could be' beneficial or not. Even then I added the caveat 'but not necessarily arguing for' since it seemed to be more of an issue to get the prime factors than to just do the additional multiplies on the value.

                      It is true that 65^n = 5^n * 13^n - when we consider x^n, if x is an integer and we have its prime factorization, and we know the powers of its prime factors, then we can easily compute x^n. But we are not interested in such things; we evaluate polynomials at real numbers, not at integers.

                      Forget the prime thing. That product rule applies to real numbers - not just integers.

                       

                      And unless I misunderstood each coefficient (based on x^n) uses the same x - it is just 'n' that is different.

                       

                      So any value of x^n that gets computed for use by one coefficient can be used by ALL others that need an exponent equal or greater than 'n'.

                      It makes no sense to store any pre-calculated powers, since we don't know what x will be when we speak of x^n, or P(x) for a polynomial P.

                      My thought there was to do the pre-calculation when the coefficients are stored. So when I said 'If you have 5, 23, 37' I was thinking that if you pass 5, 23 and 37 to a constructor, or create an array with those value in it you could do one, or more, pre-calculations at that time.

                       

                      You might precalculate the x^5 and store it with the polynomial so it could be used 4 times to do the x^23 calc which might be done later.

                       

                      The idea was to consider getting performance by moving some of the calcs around and do them when the overhead of doing them won't be as impactful.

                       

                      For instance if the polynomials are being read from a physical disk/table in order to create instances of your custom object you can often do a LOT of calculations during that I/O because I/O is the slowest link in the chain.

                       

                      So you might be able to do several of those 'powers of 2' calculations during the I/O load and save them as additional attributes. Then that part of the calcs would already be done when it came time to compute the entire polynomial.

                       

                      Just throwing out ideas here.

                      In an iterative algorithm like Newton's method the value of x changes at each step, and it is not known before the previous step is completed.

                      Understood

                      My question was quite different. Given just x, how quickly can we compute, say, x^53?

                      Ok - but there seemed to really be TWO different, but related questions. The one in the quote just above and this one:

                      This is really the topic of this branch - what are other methods of computing powers of a given real number x, for positive integer exponents, where we may need to evaluate many different powers (with different exponents)?

                      Here is a completely different approach, to which I alluded in the initial post in this branch:

                      The first question is more of a 'one off' - 'how quickly can we compute, say, x^53' assuming that x^53 is the ONLY computation needed.

                       

                      The second question is how to compute an entire set of x^n that belong to a single polynomial. That is the area I was really replying to.

                       

                      In summary - my suggestion was to use that product rule so that calculations done for the first coefficients can be reused (rather than redone) for the later coefficients.

                       

                      And for the general case I agree that using binary powers would be the easiest to both do and to apply generally to later coefficients.

                       

                      The exponential function can be evaluated very quickly, because its Taylor series converges very rapidly, even for large values of the argument. This approach is not very promising because ln(x) is hard to evaluate.

                       

                      But now let's say we want to minimize the time to compute x^n for a fixed x but many values of n. Now the approach becomes much more promising: we only need to compute ln(x) once, then we must multiply it by various n (so there is only ONE multiplication for each n), and then we must exponentiate the results. EXP is much more expensive than a single multiplication, but it may be less expensive than the MANY multiplications needed to compute x^n for each n.

                      I would expect the relative value of using exponentiation would likely change based on the specific requirements that need to be met.

                       

                      That is, for sparse cases versus dense cases. I'd be inclined to think that you might want at least two solution branches - one for sparse and one for dense.

                       

                      There might also be a need to take either the total number of 'n' values (for x^n) involved or perhaps the size/distribution of the gaps between 'n' values. I'm thinking of 'skew' though not sure that is the right term here.

                       

                      You did a good job with considering the basic number of operations involved. That would probably need to be done for a couple of example variations on gap size and/or total number 'n' values.

                      The minimum can always be achieved by using this "binary representation" approach. (This is not obvious on its face, perhaps - it is a theorem, with an easy proof by induction.)

                      I'm not sure I would agree with that without seeing/hearing that 'easy proof by induction'.

                       

                      Consider the following methodology. The example assumes the exponent values are ordered but that isn't required. The example does NOT use a loop - the loop us unrolled ONLY for illustration and to make it easier to understand which value is being worked on. An implementation would use a loop and recursion.

                       

                      1. exponents are 5, 47, 103

                      2. compute value for 5

                          A. use the largest known value to start the calc - since this is the first calc the only known value is 1 so use your binary method

                          B. compare twice the largest known value (1)  to the exponent being calc'd (5). Since 2 (double the known value) is less than 5 compute the value for exponent 2 (1 + 1) - x^1 times x^1 and save that new value. You now have values for exponents 1 and 2.

                          C. repeat step B and compare twice the new largest known value (now 2) to 5. Since 4 is less than 5 compute the value for exponent 4 - x^2 times x^2 and save that new value. You now have values for exponents 1, 2 and 4 (your binary progression since we started with 1.

                          D. repeat step B and compare twice the new largest known value (now 4) to 5. Since 8 is GREATER THAN 5 stop this 'recursive' process.

                          E. now the result will include ONE copy of that last largest known value (4). If that value (4) is equal to the value you need to compute you are done.

                       

                      --- the process done by the above series of steps is to find the LARGEST known value that, when doubled, is greater than the value of interest (5). One copy of that last value will be used as a component of the solution - two copies would be too large.

                       

                          F. you get here when you have found that first value that, when 'doubled', is too large.

                       

                      --- the goal now is to find the LARGEST of the remaining known values to  include next. We need a value for 5 and steps A thru F gave us the first component (4) of the result. So now we need a value for the rest (i.e. a value for 5 - 4 or 1).

                       

                      --- since the list of known values is 1, 2, 4 and we have already used 4 we repeat those A-F steps but the value being calc'd is 1

                       

                          G. do the  steps above starting with 2 - we start with the LARGEST values that is less than the one we found above. We have already used 4 so we start this search with 2.

                       

                          H. compare twice the largest known value (2 - we already used 4)  to the exponent being calc'd (1). Since 4 (double the known value of 2) is greater than 1 we can't use 2

                       

                          I. compare twice the next lowest known value (1 (largest known value less than the 2 we just rejected as too large) to the exponent being calc'd. Since it is equal to the value we want (1) we are done.

                       

                          J. compute the result for 5 using the value for 4 we found first and the value of 1 we just found.

                       

                          K. ADD that result for 5 to the list of known values - it will become the new largest known value for the next set of calcs. The new list is now 1, 2, 4, 5

                       

                      --- you now have the needed result for exponent 5

                       

                      --- now compute the value for exponent 47 the steps to do that will be the same as the steps above.

                       

                      --- the KEY DIFFERENCE is that the value in the list of last known results is now 5 where it was only 1 when we started

                       

                      3. calc the value for 47 using the same steps as before

                          A. use 5 the largest known value to start the calc

                          B. compare twice the largest known value (5)  to the exponent being calc'd (47). Since 10 (double the known value) is less than 47 compute the value for exponent 10 (5 + 5) - x^5 times x^5 and save that new value to the list..

                          C. repeat step B and compare twice the new largest known value (now 10) to 47. Since 20 is less than 47 compute the value for exponent 20 - x^10 times x^10 and save that new value. You now have values for exponents 1, 2, 4, 5, 10, 20. NOTE - no longer a binary sequence but a doubling of the last value of 5 sequence.

                          D. repeat step B and compare twice the new largest known value (now 20) to 47. Since 40 is less than 47 compute the value for exponent 40 - x^20 times x^20 and save that new value. The list is now 1, 2, 4, 5, 10, 20.

                          E. repeat step B and compare twice the new largest known value (now 40) to 47. Since 40 is less than 47 compute the value for exponent 40 and save it in the list

                       

                      --- now the next doubling to 80 will be greater than 47. Since 40 is the largest value now in the list it becomes the first component of the result needed for value 47. So we use '47 - 40 = 7' and now need to compute a value for 7 using the same process as above.

                       

                      --- the goal now is to find the LARGEST of the remaining known values to  include next. We need a value for 7 and steps A thru E gave us the first component (40) of the result. So now we need a value for the rest (i.e. a value for 47 - 40 or 7).

                       

                      --- since the list of known values is 1, 2, 4, 5, 10, 20, 40 and we have already used 40 we repeat those A-F steps but the value being calc'd is 7

                       

                          G. do the  steps above starting with 20- we start with the LARGEST value that is less than the one we found above. We have already used 40 so we start this search with 20.

                       

                      --- quick finish since you probably already know the process by now.

                       

                          H. 20 is larger than 7 so check 10

                       

                          I. 10 is larger than 7 so check 5

                       

                         J. 5 is less than 7 so use it with 40 in the result - that leaves us needing 7 - 5 = 2

                       

                         K. we find 2 in the list and use it

                       

                         L. the final needed list is thus x^40, x^5 and x^2 where 40 + 5 + 2 add up to the 47 we needed.

                       

                         M. add the value for x^47 to the known list

                       

                      --- now we need the value for 103

                       

                      --- that search will start by using 47 which doubled gives 94 and the value for 94 will be added to the  'known' list and used as the first component of the result

                       

                      --- since 103 - 94 = 9 we calc a value for x^9 using the value for 4 and 5 from the list - x^94, x^5, x^4

                       

                      SUMMARY of above method

                       

                      1. a 'doubling sequence' is used when new values are needed

                       

                      2. the initial doubling will be binary since the ONLY known value initially is '1' which naturally begins a binary doubling. But note that we do NOT intentionally choose 'binary' - we just double the last known value when we need a new one. It just happens to be binary for the first set

                       

                      3. we ALWAYS say the intermediate values generated and add then to the 'known' list of values

                       

                      4. we ALWAYS use the largest known value for our doubling when we need a new value. It doesn't matter if it is binary or not.

                       

                      5. we NEVER calculate values that aren't needed ahead of time. That is, we do NOT pre-calculate a set of binary doubling values with the expectation that we 'might' need/use them later.

                       

                      6. the amount of doubling for any step depends on the size of the gap between steps

                       

                      7. the list of known values could be stored in a sparse array where the index value represents the exponent value. Since the largest exponent needed in the example was 103 then an array size of 103 is all that is needed.

                       

                      8. unused array values could be either null or zero. When we search backwards  we can check every array value and just skip the null/zero entries. So in step 3G above we have used '40' and need the next lower 'max' value. We could start with 39 and skip it since it is null/zero. So we check 38, 37, ...21 and find that 20 is the value we need to use to continue.

                       

                      I would expect that a loop to check those 20 array entries (or even 1000) would be so fast it isn't worth worrying about. But it is rather trivial to have a secondary array that contains ONLY indexes of entries that actually have values. Then you could find that '20' item directly.

                       

                      Not even trying to suggest 'optimization' at this point.

                       

                      Be interested to hear your thoughts about the above 'doubling' of largest value versus a strict use of binary doubling only.

                       

                      Though if the intent is to use binary shifts to do multiplication on binary values rather than actual multiplication then I certainly agree that the binary shifts sill be far faster and most likely would mitigate to using ONLY powers of two for the intermediate values in the 'known' list.