3 Replies Latest reply: Dec 7, 2012 2:40 PM by Frank Kulash

# Remove outliers when calculating safety stock?

I have a table with the following values used in calculating safety stock. For the calculation will be as good as possible, we need to remove the outliers. The question is how I do to remove them. In the example below, I would remove the value 30 and 137 165 as they differ very extreme.
Mtrl          Quantity     YearWeek
Label1             3 100          1240
Label1          1 984          1241
Label1          5 670          1242
Label1          30          1243
Label1          3 888          1244
Label1          1 651          1245
Label1          1 881          1246
Label1          1 985          1247
Label1          3 240          1248
Label1          980          1249
Label1          137 165          1250
• ###### 1. Re: Remove outliers when calculating safety stock?
user570142 wrote:
I have a table with the following values used in calculating safety stock. For the calculation will be as good as possible, we need to remove the outliers. The question is how I do to remove them. In the example below, I would remove the value 30 and 137 165 as they differ very extreme.
Mtrl          Quantity     YearWeek
Label1             3 100          1240
Label1          1 984          1241
Label1          5 670          1242
Label1          30          1243
Label1          3 888          1244
Label1          1 651          1245
Label1          1 881          1246
Label1          1 985          1247
Label1          3 240          1248
Label1          980          1249
Label1          137 165          1250
I would have to dig a bit, so I'll leave that as an exercise for the student. But what you need to know to get started is that to do this "properly" you will need to compute the percentile rankings and then exclude what falls above or below whatever percentile ranking you determine defines your outliers. Google ' percentile ranking' and bell curve distribution'.

Actual implementation in code shouldn't be too difficult once you understand percentiles and the math behind them.
• ###### 2. Re: Remove outliers when calculating safety stock?
First. Thanks for the create table and data insert script. We appreciate it when you help us to help you.

Second. You'll need to decide what an "outlier" is. One way to do it would be n standard deviations away from your average. So, using that methodology and calling an outlier anything more than 2SD from the average, you could do something like this.
create table my_data (
mtrl   varchar2(10),
quantity number,
yearWeek integer
);

insert into my_data values('Label1',3100,1240);
insert into my_data values('Label1',1984,1241);
insert into my_data values('Label1', 5670,1242);
insert into my_data values('Label1',30,1243);
insert into my_data values('Label1',3888,1244);
insert into my_data values('Label1',1651,1245);
insert into my_data values('Label1',1881,1246);
insert into my_data values('Label1',1985,1247);
insert into my_data values('Label1',3240,1248);
insert into my_data values('Label1',980,1249);
insert into my_data values('Label1',137165,     1250);
commit;

with stats as (
select mtrl,  avg(quantity) avg_q, stddev(quantity) stdv_q
from my_data
group by mtrl
)
select md.mtrl, md.quantity
from my_data md, stats
where md.mtrl = stats.mtrl
and quantity between (avg_q - (stdv_q * 2)) and (avg_q + (stdv_q * 2))
This would weed out the week 1250 row.

==========================

Frank's post made a good point that should probably be highlighted. It may or may not be what you want, but the analytic functions will calculate the analytics for a given partition (in this case, mtrl) value. So when done that way, there isn't one average and one stddev if you have data containing multiple mtrl values. For example. Add some more data:
insert into my_data values('Label2',113888,1244);
insert into my_data values('Label2',11651,1245);
insert into my_data values('Label2',11881,1246);
insert into my_data values('Label2',11985,1247);
insert into my_data values('Label2',13240,1248);
insert into my_data values('Label2',1980,1249);
insert into my_data values('Label2',1137165,     1250);
commit;
and do a query similar to what Franks did with his data.
with stats as (
select  mtrl,
quantity,
yearweek,
avg(quantity) over (partition by mtrl) avg_q,
stddev(quantity) over (partition by mtrl) stdv_q
from my_data
)
select  stats.mtrl,
stats.quantity,
stats.yearweek,
trunc(avg_q,2) avg_q_trunc,
trunc(stdv_q,2) stdv_q_trunc
from stats
where quantity between (avg_q - (stdv_q * 2)) and (avg_q + (stdv_q * 2))
These results show your columns plus the average and standard deviation value being used for that row. "Outliers" will calculated relative to the other rows with an identical mtrl value.

Edited by: Gaff on Dec 7, 2012 4:15 PM
• ###### 3. Re: Remove outliers when calculating safety stock?
Hi,

Oracle has a few built-in statistical functions, such as AVG and STDDEV, that you might use.
Since I don't have your sample data, I'll use scott.emp to illustrate:
WITH     got_analytics     AS
(
SELECT       deptno
,       ename
,       sal
,       AVG (sal)    OVER (PARTITION BY  deptno)     AS avg_sal
,       STDDEV (sal) OVER (PARTITION BY  deptno)     AS stddev_sal
FROM       scott.emp
)
SELECT       a.*
,       CASE
WHEN  ABS (sal - avg_sal)     > stddev_sal * 1.0   -- Change this number
THEN  '*** Outlier ***'
END     AS outlier
FROM       got_analytics  a
ORDER BY  deptno
,            sal
;
Output:
`   DEPTNO ENAME             SAL  AVG_SAL STDDEV_SAL OUTLIER
---------- ---------- ---------- -------- ---------- ---------------
10 MILLER           1300  2916.67    1893.63
10 CLARK            2450  2916.67    1893.63
10 KING             5000  2916.67    1893.63 *** Outlier ***

20 SMITH             800  2175.00    1123.33 *** Outlier ***
20 JONES            2975  2175.00    1123.33
20 FORD             3000  2175.00    1123.33
20 SCOTT            3000  2175.00    1123.33

30 JAMES             950  1566.67     668.33
30 MARTIN           1250  1566.67     668.33
30 WARD             1250  1566.67     668.33
30 TURNER           1500  1566.67     668.33
30 ALLEN            1600  1566.67     668.33
30 BLAKE            2850  1566.67     668.33 *** Outlier ***
I gave a very tight range for outliers, just so I could show some outliers. In reality, where I used a coefficient of 1.0, a value of 3.0 is more common. You can use anything you want, depending on what you want to call an outlier.