This discussion is archived
6 Replies Latest reply: Feb 15, 2013 2:19 PM by 708388

# Summary by year sum or analytic

Currently Being Moderated
I would like to find out how to summarize some data and do calculations without hardcoding it. Trying to sum counts by year and then show counts for 4 years and 6 years without hard code if possible. The 4 yr would be calculated by adding 390 to the cohortterm (200540 + 390 = 200930) The 6 yr would be calculated by adding 590 to the cohortterm (200540 + 590 = 201130).

The last 2 cols, which I haven't done yet would be a percentage based on Gradyr4 divided by HeadCount and another for Gradyr6 divided by headcount. If the pcts are too tricky, I can copy the results to excel to do that part. Eventually, I will need to further break down the counts by Gender and Ethnicity, but hopefully, I can figure it out once I know the syntax for doing the overall numbers.

BANNER
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production

Sample output based on the query I have built so far using sum and hard coding for one year only. The actual results would show counts rather than 0's in the year rows displayed below:
``````COHORTTERM     HEADCOUNT     GRADYR4     GRADYR6     PCT4Yr  PCT6Yr
200340          0          0     0
200440          0          0     0
200540          600          403     539     (403/600)
200640          0          0     0
200740          0          0     0``````
SQL
``````select cohortterm,
sum(case when cohortterm = '200540' then 1 else 0 end) as headcount,
sum(
CASE WHEN cohortterm = '200540' and gradtermban < ''200930' then 1 else 0 end

sum(
CASE WHEN cohortterm = '200540' and gradtermban < '201130' then 1 else 0 end
null as pct4yr, null as pct6yr

from cohorts
where cohortterm between '200340' and '201240'
and substr(ftpt,1,1) = 'F'
group by cohortterm
order by 1``````
• ###### 1. Re: Summary by year sum or analytic
Currently Being Moderated
Since you are already grouping on COHORTTERM why are you hardcoding the values (e.g. sum(case when cohortterm = '200540' then 1 else 0 end) as headcount0 in the case statements?

If you really have different conditions for those values
``````CASE WHEN cohortterm = '200540' and gradtermban < '200930' then 1 else 0 end
then just add more conditions for the different COHORTTERM values. How many distinct values of COHORTTERM are there?

Also explain your calculations like this one: 'The 4 yr would be calculated by adding 390 to the cohortterm (200540 + 390 = 200930) '

What do those different values indicate?
• ###### 2. Re: Summary by year sum or analytic
Currently Being Moderated
Thanks for responding. I would like to get away from hard coding. I posted that query to show what I have done so far when I realized that there probably ought to be a more flexible method to summarize the data. The terms are made up of a year and a two digit code indicating spring summer and fall. We are looking to identify the start term and count how many graduated in 4 years and 6 years. Since I was given a sample query that showed 200540 as the start and < 200930 as the end (meaning it would include 200920 and not 200930 or 200940), I realized I could not simply add 4 to the 2005. I did a subtraction of 200930 from 200540 and that is where the 390 came from. For now, I've gone with your suggestion to continue adding more conditions and looks like that will provide the results.
``````with c as (
select cohortterm,
sum(
CASE WHEN cohortterm = '200540' and gradtermban < '200930' then 1
WHEN cohortterm = '200640' and gradtermban < ''201030' then 1
WHEN cohortterm = '200740' and gradtermban < '201130' then 1
--...etc
--...
else 0
end
sum(
CASE WHEN cohortterm = '200340' and gradtermban < '200930' then 1
WHEN cohortterm = '200440' and gradtermban < '201030' then 1
WHEN cohortterm = '200540' and gradtermban < '201130' then 1
--...etc
--...
else 0
end
from cohorts
where cohortterm between '200340' and '201240'
and substr(ftpt,1,1) = 'F'
group by cohortterm
)
from c
order by 1``````
• ###### 3. Re: Summary by year sum or analytic
Currently Being Moderated
I think you just need to "hard code" the meaning of the two digits at the end.

What are the possible values and their meaning?

Sample data (with CREATE TABLE and INSERT statements) would be most helpful.
• ###### 4. Re: Summary by year sum or analytic
Currently Being Moderated
>
The terms are made up of a year and a two digit code indicating spring summer and fall. We are looking to identify the start term and count how many graduated in 4 years and 6 years. Since I was given a sample query that showed 200540 as the start and < 200930 as the end (meaning it would include 200920 and not 200930 or 200940), I realized I could not simply add 4 to the 2005. I did a subtraction of 200930 from 200540 and that is where the 390 came from
>
You need to finishing nailing down the requirements before you start writing code. For that you need the answers to (post them if you have them):

1. what are the two digit codes for: spring, summer and fall

2. if the start term is for spring how do you calculate the end term that would represent 1 year? 2 years? 3 years? 'n' years?

3. do you need to take into account a term (period/semester/whatever you call it) that a student skipped and did not attend? A student might attend 2005 in the fall and 2009 in the spring but that does that mean they graduated in 4 years when they didn't even attend any other periods? They really only took one 'logical' year; it just took 4 years to do it.

The answers to the above will tell you how calculate the end period from the start period and then you don't need to hard code those values.
• ###### 5. Re: Summary by year sum or analytic
Currently Being Moderated
Based on your sample code, here's a guess at what you want. You just need to extract the year part, add 4 or 6, then put it back with the right value for the period. By the way, putting two bits of information in the same column is against "normal form" database theory.
``````DROP TABLE T;
CREATE TABLE T(COHORTTERM VARCHAR2(6), GRADTERMBAN VARCHAR2(6));
INSERT ALL
into t values('200304', '200620')
INTO T VALUES('200304', '200720')
INTO T VALUES('200304', '200820')
INTO T VALUES('200304', '200920')
INTO T VALUES('200404', '200620')
INTO T VALUES('200404', '200720')
INTO T VALUES('200404', '200820')
INTO T VALUES('200404', '200920')
SELECT NULL FROM DUAL;

SELECT
COUNT(*) CNT,
SUM(CASE WHEN GRADTERMBAN < TO_NUMBER(SUBSTR(COHORTTERM,1,4))+4||'03' THEN 1 END) YEAR4,
sum(case when gradtermban < TO_NUMBER(SUBSTR(COHORTTERM,1,4))+6||'03' then 1 end) year6
FROM T
group by cohortterm;

CNT      YEAR4      YEAR6
---------- ---------- ----------
4          2          4
4          1          3``````
• ###### 6. Re: Summary by year sum or analytic
Currently Being Moderated
Thanks Stew,

Your TO_NUMBER(SUBSTR(COHORTTERM,1,4))+4||'03' formula worked after changing '03' to '30' as the last two digits are '20','30','40'.

#### Legend

• Correct Answers - 10 points