Forum Stats

  • 3,751,256 Users
  • 2,250,338 Discussions
  • 7,867,361 Comments

Discussions

Fun with abstract data types: Implement "stack" in PL/SQL

mathguy
mathguy Member Posts: 9,982 Gold Crown
edited Dec 17, 2020 6:30PM in SQL & PL/SQL

Fun challenge: implement the linear data structure known as "stack" in PL/SQL.

A stack is a totally ordered finite set of values, all of the same data type. In addition to storing those values in whatever way, "stack" must also support the following three operations (methods):


empty (<stack>) - a Boolean function telling us if the stack is empty (no values).

push(<stack>, <value>) - a procedure to add one more value AT THE END of the sequence.

pop(<stack>) - a function that returns the LAST ELEMENT of the stack, and removes it from the stack.


In addition, obviously, we need to be able to create empty stacks as needed.


The challenge is to implement this structure in PL/SQL, as efficiently as possible.

Since PL/SQL doesn't support generic data types (at least I think it doesn't), to fix ideas, let's only consider numeric stacks. That is, the data type of the values stored in the stack is NUMBER.

In theory, there should be no upper limit to the number of elements in a stack ("stack overflow" is always an implementation concept, it shouldn't exist in the ideal world none of us lives in). In practice, there is always at least the physical limit imposed by available resources; we can either implement "stack" with a user-provided max size, and handle "stack overflow" conditions, or write the implementation so that the stack size can grow dynamically (and then still handle "stack overflow" caused by resource exhaustion). I will be happy with either version; in my tests, the latter creates significant drag on performance.


There are (at least) two natural ways to attack this. ("Natural" in my view, which may be wrong.) One is to create a package with the needed type and procedure/function definitions, and the other is to create a stand-alone (schema-level) data type, encapsulating everything within an object's attributes and methods. In my tests, I found that the package approach is more efficient, but perhaps I didn't come up with the best solution (for EITHER approach). For this challenge, use whichever approach you think is best.


The implementation should support the execution of the following anonymous block (which assumes a package approach to the solution; the changes needed for an object type implementation should be trivial). Here DATA_STRUCT is the package name, N_STACK is the type name, etc.


declare
 s data_struct.n_stack;
 x number;
begin
 s := data_struct.new_n_stack(10000000);
 for i in 1 .. 10000000 loop
   data_struct.push(s, i);
 end loop;
 while not data_struct.empty(s) loop
   x := data_struct.pop(s);
--   dbms_output.put_line(x);
 end loop;
end;
/


Notice the NEW_N_STACK function call. That function returns an empty stack with a maximum capacity of 10 million values.

The commented-out PUT_LINE call is for testing (of course, test with ten values or so, not with 10 million). Otherwise the procedure doesn't generate any output; just observe the execution time. Also, while you are developing solutions, you may want to start with fewer values, but the final solution should be able to handle 10 million values in a reasonable amount of time - for example, less than a minute. Note that the block executes each of EMPTY, PUSH and POP 10 million times, and the stack itself gets to be big enough for proper testing; this should suffice for most reasonable applications.


MOTIVATION: A couple of months ago I implemented an algorithm for finding the connected components of a (possibly very large) graph. I compared against the Oracle-provided package for the same, and I found that my implementation was much faster. I wonder what else can be done better. For more complex algorithms, I will need many basic ADT's such as stack, so I decided to start from the bottom. This is the first step.

Best Answer

«134

Answers

  • Paulzip
    Paulzip Member Posts: 8,409 Blue Diamond
    edited Dec 17, 2020 10:35PM

    Not rigorously tested, but maybe something like this... (NB: This site's formatter will mangle the indentation)

    drop type ty_numberstack
    /
    
    drop type ty_numberlist
    /
    
    create or replace type ty_numberlist is table of number
    /
    
    create or replace type ty_numberstack is object (
     nlist ty_numberlist
    , top  integer
    , maxsize integer
    , constructor function ty_numberstack (maxsize integer default null) return self as result
    , member procedure Push(self in out nocopy ty_numberstack, num in number)
    , member procedure Pop(self in out nocopy ty_numberstack, cnt in integer default 1)
    , member function Pop(self in out nocopy ty_numberstack) return number
    , member function IsEmpty return boolean
    , member procedure Clear(self in out nocopy ty_numberstack)
    )
    /
    create or replace type body ty_numberstack is
    
     constructor function ty_numberstack (maxsize integer default null) return self as result is
     begin
      self.nlist := ty_numberlist();
      self.top := 0;
      self.maxsize := maxsize;  -- null means no limit i.e. limit to Oracle collection size 
      return;
     end;
    
     member procedure Push(self in out nocopy ty_numberstack, num in number) is
     begin
      if top >= maxsize then
       raise_application_error(-20001, 'Max stack size reached = '||maxsize);
      end if;
      nlist.extend;
      top := top + 1;
      nlist(top) := num;
     end;
    
     member procedure Pop (self in out nocopy ty_numberstack, cnt in integer default 1) is
      vCnt integer := least(top, cnt);
     begin
      if vCnt > 0 then
       nlist.trim(vCnt);
       top := top - vCnt;
      end if;
     end;
    
     member function Pop (self in out nocopy ty_numberstack) return number is
      vResult number;
     begin
      vResult := nlist(top);
      Pop();
      return vResult;
     end;
    
     member function IsEmpty return boolean is
     begin
      return top = 0;
     end;
    
     member procedure Clear(self in out nocopy ty_numberstack) is
     begin
      nlist.Delete;
      top := 0;
     end;
    
    end;
    /
    
    declare
     stack_size constant pls_integer := 30;
     s ty_numberstack;
     x number;
    begin
     s := new ty_numberstack(stack_size);
     for i in 1 .. stack_size loop
      s.Push(i);
     end loop;
     while not s.IsEmpty loop
      x := s.Pop;
      dbms_output.put_line(x);
     end loop;
    end;
    /
    
    30
    29
    28
    [snip..]
    1
    
  • Billy Verreynne
    Billy Verreynne Software Engineer Member Posts: 28,570 Red Diamond

    Paul's implementation looks good.

    Additional methods can be added to query (or adjust stack values via a specific algorithm) ala Panda DataFrames style:

    create or replace type ty_numberstack is object ( 
           nlist ty_numberlist 
           , top integer 
           , maxsize integer 
           , constructor function ty_numberstack (maxsize integer default null) return self as result 
           , member procedure Push(self in out nocopy ty_numberstack, num in number) 
           , member procedure Pop(self in out nocopy ty_numberstack, cnt in integer default 1) 
           , member function Pop(self in out nocopy ty_numberstack) return number 
           , member function IsEmpty return boolean 
           , member procedure Clear(self in out nocopy ty_numberstack) 
           , member function Value(offset in integer) return number 
           , member function GT(val in number) return ty_numberlist pipelined 
           , member function LT(val in number) return ty_numberlist 
    ) 
    / 
    show errors 
     
    create or replace type body ty_numberstack is 
     
           constructor function ty_numberstack (maxsize integer default null) return self as result is 
           begin 
                   self.nlist := ty_numberlist(); 
                   self.top := 0; 
                   self.maxsize := maxsize; -- null means no limit i.e. limit to Oracle collection size 
                   return; 
           end; 
     
           member procedure Push(self in out nocopy ty_numberstack, num in number) is 
           begin 
                   if top >= maxsize then 
                           raise_application_error(-20001, 'Max stack size reached = '||maxsize,true); 
                   end if; 
                   nlist.extend; 
                   top := top + 1; 
                   nlist(top) := num; 
           end; 
     
           member procedure Pop (self in out nocopy ty_numberstack, cnt in integer default 1) is 
                   vCnt integer := least(top, cnt); 
           begin 
                   if vCnt > 0 then 
                           nlist.trim(vCnt); 
                           top := top - vCnt; 
                   end if; 
           end; 
     
           member function Pop (self in out nocopy ty_numberstack) return number is 
                   vResult number; 
           begin 
                   vResult := nlist(top); 
                   Pop(); 
                   return vResult; 
           end; 
     
           member function IsEmpty return boolean is 
           begin 
                   return top = 0; 
           end; 
     
           member procedure Clear(self in out nocopy ty_numberstack) is 
           begin 
                   nlist.Delete; 
                   top := 0; 
           end; 
     
           member function Value(offset in integer) return number is 
           begin 
                   return self.nlist(offset); 
           end; 
     
           member function GT(val in number) return ty_numberlist pipelined is 
           begin 
                   for i in 1 .. self.nlist.Count loop 
                           if nlist(i) > val then 
                                   pipe row(i); 
                           end if; 
                   end loop; 
                   return; 
           end; 
     
           member function LT(val in number) return ty_numberlist is 
                   lt_array       ty_numberlist := new ty_numberlist(); 
           begin 
                   for i in 1 .. self.nlist.Count loop 
                           if self.nlist(i) < val then 
                                   lt_array.Extend(1); 
                                   lt_array(lt_array.Count) := self.nlist(i); 
                           end if; 
                   end loop; 
                   return lt_array; 
           end; 
     
     
    end; 
    /
    

    GT (GreaterThan) and LT (LessThan) methods added. Also implemented differently to show the difference between reducing private process memory usage (via a pipeline), in case enormous stacks are used, and using private process memory arrays instead.


    Example:

    SQL> declare 
     2         stack_size constant pls_integer := 30; 
     3         s ty_numberstack; 
     4         x number; 
     5         lt_array ty_numberlist; 
     6 begin 
     7         s := new ty_numberstack(stack_size); 
     8         for i in 1 .. stack_size loop 
     9                 s.Push(i); 
     10         end loop; 
     11  
     12         for i in(select t.column_value as offset from table(s.gt(25)) t) loop 
     13                 dbms_output.put_line('i='||i.offset||' val='||s.value(i.offset)); 
     14         end loop; 
     15  
     16         lt_array := s.lt(4); 
     17         for i in 1 .. lt_array.Count() loop 
     18                 dbms_output.put_line('i='||i||' val='||s.value(i)); 
     19         end loop; 
     20  
     21         while not s.IsEmpty loop 
     22                 x := s.Pop; 
     23 --             dbms_output.put_line(x); 
     24         end loop; 
     25 end; 
     26 / 
    i=26 val=26 
    i=27 val=27 
    i=28 val=28 
    i=29 val=29 
    i=30 val=30 
    i=1 val=1 
    i=2 val=2 
    i=3 val=3 
     
    PL/SQL procedure successfully completed.
    


  • mathguy
    mathguy Member Posts: 9,982 Gold Crown

    @Paulzip

    Thank you for your interest in this. Here are several observations and questions (based on my working on a few different versions over the past week).

    Your approach is very similar to my earliest attempts. On my system, it takes 74 seconds to push 10 million values on the stack, and then to pop them - checking for "empty" before each pop. I found similar performance with my first attempt - using the "package" paradigm rather than the self-contained "object".

    It seems pretty obvious (and it is supported by further work) that the biggest drag on performance is the dynamic memory allocation - needing to allocate more memory for each PUSH. This can be fixed, at the cost of allocating a (possibly very large, and possibly wasteful) amount of memory upfront. Below I will show how I set this up - as an object - with fixed memory allocation at the time the constructor is called. (I will show a few other things, too - please keep reading on if interested.) In this approach, at the cost of "wasting memory", the 10 million PUSH, EMPTY, POP procedure typically takes 5.8 seconds on my system. The cost of setting aside that much memory is high, but the benefit seems high too.

    A few more observations before showing the code (and other things). NOCOPY: I hadn't thought about using it. After I did, I found that on my version of Oracle db and PL/SQL (12.2), it doesn't make a measurable difference. Then I remembered something from Steven Feuerstein's book, I checked, and indeed I remembered correctly: he showed a significant difference in Oracle up to version 10, but he found that since Oracle 11, NOCOPY is no longer critical.

    Supporting data type: we could use any of the collection types (associative array, varray, or nested table). Of these, associative array is significantly slower. Nested table and varray have comparable performance; in my tests, varray was marginally faster. Nested table has the slight benefit that we don't need to give a max size when we create the type; however, as you recall, I am not a big fan of accessing nested table members by index, and since varray is also marginally faster, I went that route.

    Most important (as discussed in the next Reply in this thread - trying to avoid the dreaded "site is busy due to a long running script" issue), I found that the "package" approach is faster than the "object" approach. Not sure why - but as I start implementing more data types, and use them in more complex algorithms, it will be important to have basic things like stacks, queues, heaps etc. run as fast as possible.

    Note also that I didn't write error handling in PUSH and POP. I wonder if that is really needed, since PL/SQL does check for invalid indexes anyway. (That would be very different in C, for example.)

    So - here is how I wrote the stack implementation using the "object" approach. As you can see, it is very similar to what Oracle themselves show here: https://docs.oracle.com/cd/A97630_01/appdev.920/a96624/13_elems32.htm - see the Examples section.


    drop type num_stack;
    drop type num_nt;
    
    create or replace type num_nt as varray(2147483647) of number;
    /
    
    create or replace type num_stack as object (
     s   num_nt,
     top integer,
     constructor function num_stack(max_height integer default 10000) return self as result,
     member function  empty return boolean,
     member procedure push(x number),
     member function  pop(self in out num_stack) return number
    );
    /
    
    create or replace type body num_stack as
    
     constructor function num_stack(max_height integer default 10000)
       return self as result
     is
     begin
       top       := 0;
       s         := num_nt();
       s.extend(max_height);
       return;
     end num_stack;
    
     member function empty return boolean
     is
     begin
       return top = 0;
     end empty;
    
     member procedure push(x number)
     is
     begin
       top    := top + 1;
       s(top) := x;
     end push;
    
       member function pop(self in out num_stack) return number
       is
         x number;
       begin
         x   := s(top);
         top := top - 1;
         return x;
       end pop;
    
    end;
    /
    
    declare
     x number;
     ns num_stack := num_stack(10000000);
    begin
     for i in 1 .. 10000000 loop
       ns.push(i);
     end loop;
     while not ns.empty loop
       x := ns.pop;
     end loop;
    end;
    /
    


    Marwim
  • mathguy
    mathguy Member Posts: 9,982 Gold Crown
    edited Dec 19, 2020 2:00AM

    Here is the "package" implementation I came up with. Which is otherwise the same as the "object" implementation; not sure why I get a meaningful performance difference. As shown below, the same block that calls each of PUSH, EMPTY and POP 10 million times takes 4.5 seconds (vs. 5.8 seconds with the "object" approach). What is more: If I expose the TOP attribute of the stack, and I do away with the call to EMPTY (I replace the WHILE loop with "while s.top != 0 loop..."), execution time drops from 4.5 seconds to 3.4 seconds, again a significant improvement.

    Perhaps I shouldn't care about a 30% increase in execution time as the cost of encapsulating EMPTY (so I don't have to expose TOP to the outside world) - but I wonder if in the implementation of other data types, where more such small functions may be needed, the penalty will not be much greater than 30%. This seemed excessive to me. Thoughts?

    In C, a small function like EMPTY would be inlined - there would be no difference between "not empty(s)" and "s.top != 0". Now, I understand that PL/SQL is an interpreted language by default; but even after changing the compilation type to NATIVE and recompiling everything, I get the same performance. I wonder if PL/SQL supports inlining of small functions, no matter by what mechanism (even if it is not visible to us as users).

    So, here's the package implementation. Nothing surprising, I think, other than the performance difference vs. the "object" approach.

    create or replace package data_struct as
      type n_stack_data is varray(2147483647) of number;
      type n_stack      is record (a n_stack_data, top pls_integer);
      function  new_n_stack(max_height pls_integer default 10000) return n_stack;
      procedure push(s in out n_stack, x number);
      function  pop(s in out n_stack) return number;
      function  empty(s n_stack) return boolean;
    end data_struct;
    /
    
    create or replace package body data_struct as
    
     function new_n_stack(max_height pls_integer default 10000) return n_stack
     is
       s n_stack;
     begin
       s.a := n_stack_data();
       s.a.extend(max_height);
       s.top := 0;
       return s;
     end new_n_stack;
    
     procedure push(s in out n_stack, x number)
     is
     begin
       s.top      := s.top + 1;
       s.a(s.top) := x;
     end push;
    
     function pop (s in out n_stack) return number is
       x number;
     begin
       x     := s.a(s.top);
       s.top := s.top - 1;
       return x;
     end pop;
    
     function empty (s n_stack) return boolean is
     begin
       return (s.top = 0);
     end empty;
    
    end data_struct;
    /
    
    declare
     s data_struct.n_stack;
     x number;
    begin
     s := data_struct.new_n_stack(10000000);
     for i in 1 .. 10000000 loop
       data_struct.push(s, i);
     end loop;
     while not data_struct.empty(s) loop
       x := data_struct.pop(s);
     end loop;
    end;
    /
    
    Marwim
  • padders
    padders Member Posts: 1,055 Silver Trophy
    edited Dec 19, 2020 1:33PM

    > There are (at least) two natural ways to attack this

    Third way - consider not using a separate package or object type at all.

    Regarding in-lining (e.g. see 21c docs for INLINE pragma) Oracle documentation still states the following (the key bit being in brackets).

    "Inlining replaces a subprogram invocation with a copy of the invoked subprogram (if the invoked and invoking subprograms are in the same program unit)"

    Implying that the way to benefit from in-lining of the implementation of something like a stack in PL/SQL for use in shunting-yard/depth-first search (or similar) is not to abstract the stack implementation as a separate program unit (be it package or object type), however much good sense that might make, but rather to include the stack routines in the same program unit that is doing the work and explicitly hint for in-lining opportunities with pragma inline or enforce in-lining by setting plsql_optimize_level = 3.

  • Solomon Yakobson
    Solomon Yakobson Member Posts: 18,667 Black Diamond

    I am probably missing something but what's wrong with simply using associative array which is a superset of "stack"? Why would I need to use something like push(stack,value) if I can just use:

    declare
        type associative_array_type is table of ... index by pls_integer;
        associative_array associative_array_type;
    begin
        associative_array(nvl(associative_array.last,0) + 1) := value;
    

    And avoid all overhead of user defined constructors and methods?

    SY.

  • Solomon Yakobson
    Solomon Yakobson Member Posts: 18,667 Black Diamond

    And if you want to limit stack depth:

    SQL> desc obj_type
     Name                                      Null?    Type
     ----------------------------------------- -------- ----------------------------
     X1                                                 VARCHAR2(20)
     X2                                                 VARCHAR2(20)
    
    
    SQL> declare
      2      subtype constrained_pls_integer is pls_integer range 1..2; -- stack depth is limited to 2
      3      type a_type is table of obj_type index by constrained_pls_integer;
      4      v_a a_type;
      5  begin
      6      v_a(nvl(v_a.last,0) + 1) := obj_type('A','B');
      7      v_a(nvl(v_a.last,0) + 1) := obj_type('C','D');
      8      dbms_output.put_line(v_a(2).x2);
      9      dbms_output.put_line(v_a(1).x2);
     10      v_a(nvl(v_a.last,0) + 1) := obj_type('E','F');
     11  end;
     12  /
    D
    B
    declare
    *
    ERROR at line 1:
    ORA-06502: PL/SQL: numeric or value error
    ORA-06512: at line 10
    
    SQL>
    

    SY.

  • mathguy
    mathguy Member Posts: 9,982 Gold Crown

    Two reasons.

    First, performance. I just tested again (I had tested before, but using associative arrays within a package - stil encapsulating as much as possible for the second reason, shown below). I get slightly better performance by coding directly with associative array (essentially: inlining everything), but for 10 million push, check-for-empty, pop operations, the execution time is 68 seconds on my system. 20 times longer than the best I was able to do with other methods. And, as I said, I am concerned that the performance divergence may get a lot worse for more complex (but still basic) data structures, which are then relied upon in really complex algorithms.

    With varrays and nested tables, we can get much better performance - at the cost of over-allocating resources - with static memory allocation, as I showed in an earlier reply. Unfortunately, associative arrays don't work that way. But, even with dynamic memory allocation (one element at a time), associative arrays were slower than varrays and nested tables, by a non-trivial margin.

    The second reason is the ability to (easily) reuse the code. If I need stacks in many places, I want to be able to simply "create a stack" and use it where needed, rather than ad-hoc code each time I need a stack. If I find a better way to implement stack, I only need to do that in one place, rather than hunting through all the code I have written and making the same change everywhere. This, too, will become even more important for more complex (but still basic) data structures that are needed almost universally.

  • Paulzip
    Paulzip Member Posts: 8,409 Blue Diamond

    The point of a stack as an ADT or package is it has standard methods push / pop / empty - which exist across any language and any implementation. So anyone who knows about stacks would know how to use one we'd written.  

    Whereas your approach doesn't, it requires a pretty good knowledge in how to use an associative array as a stack :

    a) Such as using LAST for pushing 

    b) Remembering to nvl LAST that if it hasn't been initialised on first push.

    c) How you'd navigate the popping aspect such as remembering to use delete to clear up the top of the stack after a pop - something you didn't show.  

    d) Your implementation doesn't allow a user choice in max stack size, in your example, you needed to define your subtype up front.

    etc. etc.

    That's the whole point of these fundamental data structures, they are well established in computer science with known methods. These methods hide the internal workings of the data structure, you don't need any real implementation knowledge.