Forum Stats

  • 3,741,014 Users
  • 2,248,356 Discussions
  • 7,861,580 Comments

Discussions

Get a value from an array depending on the number of occurrence

user13117585
user13117585 Member Posts: 640 Bronze Badge
edited Jan 21, 2021 9:02PM in SQL & PL/SQL

Hello again,

As suggested in one of my previous messages, I'd better create a new question.

So, I was wondering if anyone can help me with this? I tried so many things found online...

I'm trying to randomly select a value depending on the values in the table. Anyone could help here?

Imagine I have a table of element like this one. A key may appear multiple times. In reality this table contains millions of keys and I will only keep the first 1000 or so.

CREATE TABLE elements
(
 key varchar2(10),
 value number
);

insert into elements(key, value) VALUES('a', 1);
insert into elements(key, value) VALUES('a', 2);
insert into elements(key, value) VALUES('a', 3);

insert into elements(key, value) VALUES('b', 1);
insert into elements(key, value) VALUES('b', 1);

insert into elements(key, value) VALUES('c', 1);

insert into elements(key, value) VALUES('d', 1);
insert into elements(key, value) VALUES('d', 1);
insert into elements(key, value) VALUES('d', 1);

insert into elements(key, value) VALUES('e', 1);
COMMIT;

DECLARE
 TYPE string_t IS TABLE OF VARCHAR2(50) INDEX BY PLS_INTEGER;
 elements_l string_t;
  
 rnd VARCHAR2(10) ;
BEGIN

  SELECT key
   BULK COLLECT INTO elements_l   
   FROM elements;
  
 rnd := -- get random value depending on the number of occurences that appear in my table.
  
 DBMS_OUTPUT.put_line(rnd);
  
END;
/

What I would to do is to get the keys and the number of times it appears in my set (in this elements table). Then, depending on the number of times it is there, I want to randomly give it more weight.

Somehow, it's true that if I load everything, it will work. But, I wanted to avoid that... And I'm getting stuck here...

the value that should be returned should take into account the number of values in the table. For example a appears 3 times. So if I execute my function 100 times I should get a about 30 times in the result. I know how to use dbms_random to get a value between two numbers. But I have no idea how to "weight" that random with the number of occurrences.

Any help? Any suggestions?

Any advise would be more than welcome :)

Regards,

Tagged:

Best Answer

Answers

  • Hub Tijhuis
    Hub Tijhuis Member Posts: 98 Silver Badge

    If you select all elements in the array then a random selection in that array gives a greater chance to select a value that occurs more often.

    So select a random index value between  1 and elements_l.count  (=number of all elements) and get that value in the array that corresponds.

    random_indx := -- random between 1 and elements_l.count

    dbms_output.put_line(elements_l(random_indx));

  • mathguy
    mathguy Member Posts: 9,836 Gold Crown

    In your previous thread, you were offered a few suggestions as to how to return a result like this:


    a  30
    b  50
    c  60    etc.
    


    Based on your explanation here, two things become clearer: first, the order (a, b, c, ...) is in fact irrelevant, any order would work equally well; and, you shouldn't have ranges from 1 to 100, working from 0 to 1 (with values like 0.3, 0.5, 0.6 etc.) would be just fine. And, you don't really need "ranges" like there - you only need the upper bound, as explained below.

    You do need to compute those upper bounds for each KEY, just as you were doing in the older thread. (Now I do understand how the questions were related).

    Then, in a PL/SQL block or in a SQL query, generate a random number between 0 and 1, and output a key based on where that random value falls. Like this:

    with
      prep (rand_no) as (select dbms_random.value() from dual)
    select case when rand_no < 0.3 then 'a'
                when rand_no < 0.5 then 'b'
                when rand_no < 0.6 then 'c'    --  etc.
    from .....
    


    Note that CASE branches are evaluated in order; if the second WHEN branch is reached, it is already known that RAND_NO is greater than or equal to 0.3, and in addition it is strictly less than 0.5. Etc.

    Not sure if this is all you needed to be able to finish your project yourself, or if you need more help (for example, to show how you can "connect" this idea with the result from your earlier thread: the values 0.3, 0.5 etc. shouldn't be hard-coded, and neither should the key values in the THEN clauses, but they should be based on the query in the other thread). Do think about what I showed here and let us know if you need more help. Good luck!

    user13117585
  • user13117585
    user13117585 Member Posts: 640 Bronze Badge

    hi hub,

    I agree but I wanted to avoid loading all the values in memory. I have here a quite basic example and in reality my table contains millions of rows. And, I will select only the first 1000 elements having doing a SELECT key FROM elements HAVING COUNT(*) 1000 group by key.

    So, ok I didn't explain it properly. It's because I've been so into it today that I'm getting crazy. I will update original message.

    KR,

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,722 Red Diamond
    edited Jan 21, 2021 9:07PM

    Hi,

    You can get a row at random like this:

    SELECT   deptno, ename
    FROM	 scott.emp
    ORDER BY dbms_random.value
    FETCH FIRST 1 ROW ONLY
    ;
    

    I used the Oracle-supplied scott.emp table to test this, and included ename (which is unique) just to see that the query was sometimes picking the same row. Of the 14 rows in the scott.emp table, 6 have deptno=30, and I noticed that, when I ran this query over and over, it picked rows with deptno=30 about 6 / 14 or 43% of the time.

    The query above is reasonably efficient if you just want to pick 1 random row, or if you want to pick any number without duplicating the same row. If you need to pick lots of random deptnos, and the table isn't changing, then it will be more efficient to build another table (or a PL/SQL array, if you prefer), which is probably what you were aiming at in your earlier thread:

    Computing a range (start/end) from a set — oracle-tech

    Here's how you can build the table:

    CREATE TABLE dept_picker  AS
    WITH  ranges  AS
    (
        SELECT   deptno
        ,        RATIO_TO_REPORT (COUNT (*)) OVER () AS rr
        FROM     scott.emp
        GROUP BY deptno
    )
     SELECT   deptno
     ,        rr
     ,	  SUM (rr) OVER (ORDER BY deptno) - rr AS low_val
     ,	  SUM (rr) OVER (ORDER BY deptno)      AS high_val
     FROM	  ranges
     ;
    

    The table looks like this:

    To use it, generate a random number between 0 and 1, then find the row in date_picker where that random number is between low_val (inclusive) and high_val (exclusive).

    For example, I did this to pick 1400 deptnos at random and summarize the results:

    WITH  random_values  AS
    (
      SELECT LEVEL             AS rn
      ,      dbms_random.value AS val
      FROM   dual
      CONNECT BY LEVEL <= 1400
    )
    SELECT   deptno
    ,	 COUNT (*)	                     AS cnt
    ,	 RATIO_TO_REPORT (COUNT (*)) OVER () AS ratio
    FROM    random_values rv
    JOIN    dept_picker  dp ON  rv.val >= dp.low_val
    	 		AND rv.val <  dp.high_val
    GROUP BY deptno
    ORDER BY deptno
    ;
    

    Output:

    I expected deptno=30 to be picked about 600 times out of 1400, and, sure enough, it was picked 562 times.

    user13117585
  • mathguy
    mathguy Member Posts: 9,836 Gold Crown
    edited Jan 21, 2021 9:21PM

    Although...

    This is probably much simpler than you or I thought.

    In a subquery, collect ROWNUM for all the rows in your base table. The order is irrelevant. In another subquery just get the count of rows in the table. Generate a random integer between 1 and the count, and select the row with rownum equal to that random value. This will achieve the same requirement, and in a much simpler way.

    with
      prep (key, rn) as (select key, rownum from elements)
    , random_no (rn) as (select 1 + trunc(dbms_random.value(0, count(*))) from elements)
    select key from prep join random_no using (rn)
    ;
    

    This essentially selects a random row from your input table (ELEMENTS) with equal probability. If key = 'a' on 30% of the input rows, then the probability that the selected key is 'a' is 30%. Exactly what you wanted.

    Note that DBMS_RANDOM.VALUE will select a value greater than or equal to 0 and strictly less than the count. Truncate to get INTEGERS with equal probability, between 0 and count - 1; then add 1 so the values are between 1 and count.

  • user13117585
    user13117585 Member Posts: 640 Bronze Badge

    Ahhhh I just understood something. I had taken the problem the wrong way. Basically, I was thinking about the large part first. But you are completely correct. I should see it from 0 to 1 and not from 1 to 0!

    Very nice!! I will try this tomorrow morning first thing.

    Thank you both for your replies :)

  • user13117585
    user13117585 Member Posts: 640 Bronze Badge

    Hi math guy,

    thank you for your reply. But this is something I'd like to avoid. There are millions of rows in the original table. I'd like to restrict first. I preferred to keep the question simple. But, you gave me good idea to try! I need to see the problem the other way around. Not from biggest number of occurrences to the smallest number. I have to start with the smallest. Generate a number and see were it fits in the total of all preceding rows. And it may do the trick.

    I will try and let you know!

    KR

  • mathguy
    mathguy Member Posts: 9,836 Gold Crown

    Ugh... It happened again. Until a few minutes ago, the only replies visible to me were my own two replies, as if they were the first and the second reply (and no others!) I see that what I proposed in my last reply was already proposed in the very first answer in this thread.

    I reported this kind of behavior (where we can only see some of the replies in a thread, until some sort of "merge" happens at who-knows-what intervals) in the "community" forum; that seems to have been ignored so far. :-(

  • mathguy
    mathguy Member Posts: 9,836 Gold Crown

    So - restrict first. In the WITH clause, write the subquery where you select from your base table, and "restrict" the rows using whatever filters you need to use. Add the ROWNUM thing TO THIS SUBQUERY in the WITH clause.

    Then select the random integer based on the count of THIS SUBQUERY, rather than the count of the whole table. Everything else should work without any changes.

    And, I don't see why you need PL/SQL anywhere in this process - you can do it all in SQL. Why "arrays"?

  • user13117585
    user13117585 Member Posts: 640 Bronze Badge

    Hello again, 


    I'm sorry but I still can't figure it out. First, i need an array because that list also need to be exported to a file.


    So, let me update the sample data:

    CREATE TABLE elements 
    (
     key varchar2(10)
    );
    
    
    BEGIN
     FOR i in 1..10
     LOOP
      INSERT INTO elements 
      SELECT 'key-'||level
       FROM dual
      CONNECT BY LEVEL <= i;
     END LOOP;
    END;
    


    When I do this, I have a table with this content

    SELECT key, count(*) FROM elements group by key;

    KEY   COUNT(*)

    key-1 10

    key-10 1

    key-2 9

    key-3 8

    key-4 7

    key-5 6

    key-6 5

    key-7 4

    key-8 3

    key-9 2


    key-1 appears 10 times, key-8 appear 3 times, etc.And I want this to reflect when I take a random value from this table. If I execute my procedure thousands of time, I want that key 2 that appears 9 times in my table is returned more times than key 7 that appear only 4 times.


    If I do a small plsql code to load the needed elements: 


    DECLARE 
      TYPE string_t IS TABLE OF VARCHAR2(50) INDEX BY PLS_INTEGER;
      elements_l string_t;
      
      rnd VARCHAR2 ;
    BEGIN
      SELECT key
       BULK COLLECT INTO elements_l  
       FROM elements
       GROUP BY key 
       HAVING COUNT(*) > 1;
       
    	
    	--rnd := 
       
    END;
    /
    


    And the select ratio to get a weight for each key, 

    select 

      key,

      sum(ratio) over (order by key) - ratio as "from"

      from ( 

        select key, ratio_to_report(count(*)) OVER() as ratio

        from elements

        group by key

        HAVING COUNT(*) > 1

    );


    I still don't see how to combine both. Do I need to load the result of the second select to another array then generate a DBMS_RANDOM value, look it up to see where it fits in the second select and then find the equivalent index in the first array? 


    Maybe I will eventually load the whole set in memory. Even if it's about 2 milion rows in reality, I think it will be easier to read afterwards. And when I export into the file, I will redo another select distinct?


    Thank you for your help anyway.

  • user13117585
    user13117585 Member Posts: 640 Bronze Badge

    Hi Frank. Impressive. Really! I will try to transpose this to PL/SQL.

    Thank you very much,

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,722 Red Diamond
    edited Jan 22, 2021 12:44PM

    Hi, @user13117585

    I will try to transpose this to PL/SQL.

    If you need to use PL/SQL for some reason, then you can query the element_picker table in PL/SQL, more or less as shown above. There's no need to create the element_picker table in PL/SQL; create it once the normal way, and, when necessary, truncate and re-populate it using either SQL or PL/SQL.

Sign In or Register to comment.