3 Replies Latest reply: Dec 7, 2012 2:40 PM by Frank Kulash RSS

    Remove outliers when calculating safety stock?

    573145
      I have a table with the following values used in calculating safety stock. For the calculation will be as good as possible, we need to remove the outliers. The question is how I do to remove them. In the example below, I would remove the value 30 and 137 165 as they differ very extreme.
      Mtrl          Quantity     YearWeek
      Label1             3 100          1240
      Label1          1 984          1241
      Label1          5 670          1242
      Label1          30          1243
      Label1          3 888          1244
      Label1          1 651          1245
      Label1          1 881          1246
      Label1          1 985          1247
      Label1          3 240          1248
      Label1          980          1249
      Label1          137 165          1250
        • 1. Re: Remove outliers when calculating safety stock?
          EdStevens
          user570142 wrote:
          I have a table with the following values used in calculating safety stock. For the calculation will be as good as possible, we need to remove the outliers. The question is how I do to remove them. In the example below, I would remove the value 30 and 137 165 as they differ very extreme.
          Mtrl          Quantity     YearWeek
          Label1             3 100          1240
          Label1          1 984          1241
          Label1          5 670          1242
          Label1          30          1243
          Label1          3 888          1244
          Label1          1 651          1245
          Label1          1 881          1246
          Label1          1 985          1247
          Label1          3 240          1248
          Label1          980          1249
          Label1          137 165          1250
          I would have to dig a bit, so I'll leave that as an exercise for the student. But what you need to know to get started is that to do this "properly" you will need to compute the percentile rankings and then exclude what falls above or below whatever percentile ranking you determine defines your outliers. Google ' percentile ranking' and bell curve distribution'.

          Actual implementation in code shouldn't be too difficult once you understand percentiles and the math behind them.
          • 2. Re: Remove outliers when calculating safety stock?
            Gaff
            First. Thanks for the create table and data insert script. We appreciate it when you help us to help you.

            Second. You'll need to decide what an "outlier" is. One way to do it would be n standard deviations away from your average. So, using that methodology and calling an outlier anything more than 2SD from the average, you could do something like this.
            create table my_data (
            mtrl   varchar2(10),
            quantity number,
            yearWeek integer
            );
            
            
            insert into my_data values('Label1',3100,1240);
            insert into my_data values('Label1',1984,1241);
            insert into my_data values('Label1', 5670,1242);
            insert into my_data values('Label1',30,1243);
            insert into my_data values('Label1',3888,1244);
            insert into my_data values('Label1',1651,1245);
            insert into my_data values('Label1',1881,1246);
            insert into my_data values('Label1',1985,1247);
            insert into my_data values('Label1',3240,1248);
            insert into my_data values('Label1',980,1249);
            insert into my_data values('Label1',137165,     1250);
            commit;
            
            
            
            with stats as (
              select mtrl,  avg(quantity) avg_q, stddev(quantity) stdv_q
              from my_data
              group by mtrl
            )
            select md.mtrl, md.quantity 
            from my_data md, stats
            where md.mtrl = stats.mtrl
            and quantity between (avg_q - (stdv_q * 2)) and (avg_q + (stdv_q * 2))
            This would weed out the week 1250 row.

            ==========================

            Frank's post made a good point that should probably be highlighted. It may or may not be what you want, but the analytic functions will calculate the analytics for a given partition (in this case, mtrl) value. So when done that way, there isn't one average and one stddev if you have data containing multiple mtrl values. For example. Add some more data:
            insert into my_data values('Label2',113888,1244);
            insert into my_data values('Label2',11651,1245);
            insert into my_data values('Label2',11881,1246);
            insert into my_data values('Label2',11985,1247);
            insert into my_data values('Label2',13240,1248);
            insert into my_data values('Label2',1980,1249);
            insert into my_data values('Label2',1137165,     1250);
            commit;
            and do a query similar to what Franks did with his data.
            with stats as (
              select  mtrl, 
                      quantity, 
                      yearweek, 
                      avg(quantity) over (partition by mtrl) avg_q, 
                      stddev(quantity) over (partition by mtrl) stdv_q
                      from my_data
            )
            select  stats.mtrl, 
                    stats.quantity, 
                    stats.yearweek, 
                    trunc(avg_q,2) avg_q_trunc, 
                    trunc(stdv_q,2) stdv_q_trunc
            from stats
            where quantity between (avg_q - (stdv_q * 2)) and (avg_q + (stdv_q * 2))
            These results show your columns plus the average and standard deviation value being used for that row. "Outliers" will calculated relative to the other rows with an identical mtrl value.

            Edited by: Gaff on Dec 7, 2012 4:15 PM
            • 3. Re: Remove outliers when calculating safety stock?
              Frank Kulash
              Hi,

              Oracle has a few built-in statistical functions, such as AVG and STDDEV, that you might use.
              Since I don't have your sample data, I'll use scott.emp to illustrate:
              WITH     got_analytics     AS
              (
                   SELECT       deptno
                   ,       ename
                   ,       sal
                   ,       AVG (sal)    OVER (PARTITION BY  deptno)     AS avg_sal
                   ,       STDDEV (sal) OVER (PARTITION BY  deptno)     AS stddev_sal
                   FROM       scott.emp
              )
              SELECT       a.*
              ,       CASE
                         WHEN  ABS (sal - avg_sal)     > stddev_sal * 1.0   -- Change this number
                         THEN  '*** Outlier ***'
                     END     AS outlier
              FROM       got_analytics  a
              ORDER BY  deptno
              ,            sal
              ;
              Output:
              `   DEPTNO ENAME             SAL  AVG_SAL STDDEV_SAL OUTLIER
              ---------- ---------- ---------- -------- ---------- ---------------
                      10 MILLER           1300  2916.67    1893.63
                      10 CLARK            2450  2916.67    1893.63
                      10 KING             5000  2916.67    1893.63 *** Outlier ***
              
                      20 SMITH             800  2175.00    1123.33 *** Outlier ***
                      20 ADAMS            1100  2175.00    1123.33
                      20 JONES            2975  2175.00    1123.33
                      20 FORD             3000  2175.00    1123.33
                      20 SCOTT            3000  2175.00    1123.33
              
                      30 JAMES             950  1566.67     668.33
                      30 MARTIN           1250  1566.67     668.33
                      30 WARD             1250  1566.67     668.33
                      30 TURNER           1500  1566.67     668.33
                      30 ALLEN            1600  1566.67     668.33
                      30 BLAKE            2850  1566.67     668.33 *** Outlier ***
              I gave a very tight range for outliers, just so I could show some outliers. In reality, where I used a coefficient of 1.0, a value of 3.0 is more common. You can use anything you want, depending on what you want to call an outlier.


               

              I hope this answers your question.
              If not, post a little sample data (CREATE TABLE and INSERT statements, relevant columns only), and also post the results you want from that data.
              Explain, using specific examples, how you get those results from that data.
              Always say which version of Oracle you're using (e.g., 11.2.0.2.0).
              See the forum FAQ {message:id=9360002}