Forum Stats

  • 3,751,465 Users
  • 2,250,366 Discussions
  • 7,867,433 Comments

Discussions

how to write a script to find the mean and median of the column only for 'number' datatype in table

User_9R7TX
User_9R7TX Member Posts: 1 Green Ribbon
edited Apr 23, 2021 3:47PM in SQL & PL/SQL

Answers

  • Solomon Yakobson
    Solomon Yakobson Member Posts: 18,670 Black Diamond
    edited Apr 23, 2021 4:05PM
    select  ename,
            sal,
            avg(sal) over(partition by deptno) department_mean_sal,
            median(sal) over(partition by deptno) department_median_sal
      from  emp
    /
    
    ENAME             SAL DEPARTMENT_MEAN_SAL DEPARTMENT_MEDIAN_SAL
    ---------- ---------- ------------------- ---------------------
    MILLER           1300          2916.66667                  2450
    CLARK            2450          2916.66667                  2450
    KING             5000          2916.66667                  2450
    SMITH             800                2175                  2975
    ADAMS            1100                2175                  2975
    JONES            2975                2175                  2975
    SCOTT            3000                2175                  2975
    FORD             3000                2175                  2975
    JAMES             950          1566.66667                  1375
    MARTIN           1250          1566.66667                  1375
    WARD             1250          1566.66667                  1375
    TURNER           1500          1566.66667                  1375
    ALLEN            1600          1566.66667                  1375
    BLAKE            2850          1566.66667                  1375
    
    
    14 rows selected.
    
    
    SQL>
    

    SY.

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,909 Red Diamond
    edited Apr 23, 2021 4:08PM

    Hi, @User_9R7TX

    Welcome to the Forum!

    Whenever you have a question, please post a little sample data (CREATE TABLE and INSERT statements, relevant columns only) for all the tables involved, and the exact results you want from that data, so that the people who want to help you can re-create the problem and test their ideas.

    Explain, using specific examples, how you get those results from that data.

    If you can show what the problem is using commonly available tables (like the tables in the Oracle-supplied SCOTT schema) then you don't need to post any sample data; just the results and explanation.

    Always say what version of Oracle you're using (e.g. 12.2.0.2.0).

    You can use the AVG function to get the mean, and the MEDIAN function to get the median. For example, to show the mean and median salary of all employees in each department of the Oracle-supplied scott.emp table, like this:

     DEPTNO  ENAME      SAL      MEAN_SAL    MEDIAN_SAL
    -------- ---------- -------- --------    ----------
          10 MILLER     1300     2916.667    2450
          10 CLARK      2450     2916.667    2450
          10 KING       5000     2916.667    2450
    
          20 SMITH       800     2175        2975
          20 ADAMS      1100     2175        2975
          20 JONES      2975     2175        2975
          20 SCOTT      3000     2175        2975
          20 FORD       3000     2175        2975
    
          30 JAMES       950     1566.667    1375
          30 WARD       1250     1566.667    1375
          30 MARTIN     1250     1566.667    1375
          30 TURNER     1500     1566.667    1375
          30 ALLEN      1600     1566.667    1375
          30 BLAKE      2850     1566.667    1375
    

    you can use this query:

    SELECT   deptno, ename, sal
    ,	 AVG    (sal) OVER (PARTITION BY deptno) AS mean_sal
    ,	 MEDIAN (sal) OVER (PARTITION BY deptno) AS median_sal
    FROM	 scott_emp
    ORDER BY deptno, sal
    ;
    

    This uses the analytic AVG and MEDIAN functions. Both AVG and MEDIAN work as aggregate functions as well. For details, see the SQL Language manual: MEDIAN (oracle.com)

  • mathguy
    mathguy Member Posts: 9,990 Gold Crown
    edited Apr 26, 2021 1:20AM

    It sounds like your problem is dynamic. You want to be able to pass a table name (and a schema name, to be more general) and you want to see which columns are numeric, and for each of those columns, you want to see the mean and the median.

    As you said, this can be done with a script. Namely, a SQL*Plus script - using its various features. In the screenshot below, I show what I do in my SQL*Plus session. You can do the same from many other interfaces. I created a script and I saved it in a text file called mean_and_median.sql in my SQLPATH folder. I am doing this on a Windows machine - I call the Windows more command from within the SQL*Plus host command to show the content of the script first. I do this so you can copy and paste the script, whatever your OS is. The script takes two parameters, a schema name and a table name. If you need to do this only in your own schema, you can modify the script so it doesn't expect a schema name. If you need to do this over a dblink, or if your tables have funny names (enclosed in double-quotes), etc., you can adapt the script as needed. Or, someone can help you do it, anyway.

    Since I use substitution variables for passing the schema and table name to the script, I use DBMS_ASSERT functions as a minimal form of protection against SQL injection.

    After the content of the script file, I show a few examples of calling the script, passing a schema and a table name. The first example uses the traditional EMP table in the SCOTT schema. For illustration, I include two cases that result in error. The error is exactly the same - because the list of numeric columns is empty; but the situations are different. In the first one, I actually have a table named COUNTRY_CODES in my schema (INTRO); it just doesn't have any numeric columns. In the second, I am passing a non-existent table name, TBL, in schema SCOTT. Other errors (like illegal schema or table names, etc.) may result in other errors. You will need to decide how you want to handle the different possible errors.


    SQL> host more %SQLPATH%\mean_and_median.sql
    
    set verify   off
    set feedback off
    set timing   off
    set escape   off
    set escape   \
    
    set termout off
    column col_list new_value numeric_columns
    
    select listagg(column_name, ',') within group (order by column_id) as col_list
    from   all_tab_columns
    where  owner      = dbms_assert.simple_sql_name('&1')
      and  table_name = dbms_assert.simple_sql_name('&2')
      and  data_type  = 'NUMBER'
    /
    
    set termout  on
    set numwidth 15
    column column_name format a30
    
    select  column_name, round(avg(val), 2) as mean_value, median(val) as median_value
    from    (select &numeric_columns from &1\.&2)
    unpivot (val for column_name in (&numeric_columns))
    group   by column_name
    /
    
    
    
    SQL> @mean_and_median SCOTT EMP
    
    COLUMN_NAME                         MEAN_VALUE    MEDIAN_VALUE
    ------------------------------ --------------- ---------------
    COMM                                       550             400
    DEPTNO                                   22.14              20
    EMPNO                                  7726.57            7785
    MGR                                    7739.31            7698
    SAL                                    2073.21            1550
    
    
    SQL> @mean_and_median INTRO COUNTRY_CODES
    from    (select  from INTRO.COUNTRY_CODES)
                     *
    ERROR at line 2:
    ORA-00936: missing expression
    
    
    
    
    SQL> @mean_and_median SCOTT TBL
    from    (select  from SCOTT.TBL)
                     *
    ERROR at line 2:
    ORA-00936: missing expression