SUM on DISTINCT value of other column?

Sarah Beckwith-Oracle · May 21, 2024 12:18AM

I have a dataset that is representative of a general population, and am trying to do weighted counts of the individuals based on some filters.

For example, my dataset would look something like this:

ID WEIGHT SEX REGION HOBBY

2 2300 M NE Basketball

2 2300 M NE Darts

3 1200 M SW Basketball

3 1200 M SW Football

4 2400 F NE Cycling

4 2400 F NE Darts

To get a weighted total of Males, I would add only the weight for each distinct ID where the SEX=M (2300 + 1200 = 3500).

Similarly, to get a weighted total of all those who like darts, I would add only the weight for each distinct ID where HOBBY = "Darts" (2300+2400 = 4700).

In OAC, I only see a COUNT(DISTINCT) option or a SUM(DISTINCT), but those do a distinct on the same column (ie: weight). I can have several different IDs with the same weight, but need to add the weight together for each distinct ID.

How would I do this?

Gianni Ceresa · May 23, 2024 8:41PM

The easy solution would be a second dataset where you apply a filter to only have distinct (ID, WEIGHT, SEX, REGION), because that's the real unique values and HOBBY is the one duplicating your rows.

Having 2 dataset would be the cleanest: (ID, WEIGHT, GENDER, REGION) and (ID, HOBBY) and you can join them on the ID = ID. This is what a dimension model would look like for that kind of data.

If you really want to do it in a single dataset you could try a dirty formula, something ranking the records by ID, giving you a unique ranking number starting with 1 for each ID. Then filtering on ranking = 1 before to SUM the measure you need.

Because your datasets repeat a number of columns, you need to add something allowing you to only take 1 row for each ID.

There isn't an existing formula doing all that, because it's a modelling issue that you have to correct…

GayathriAnand-Oracle · May 23, 2024 6:16PM

Not sure I completely get your question but would creating calculated column with formula FILTER(<measure> USING HOBBY='Darts') or FILTER(<measure> USING SEX='M'),etc. be an option to get this kind of data?

Other comments welcomed.

Thanks

Gayathri

Sarah Beckwith-Oracle · May 23, 2024 7:55PM

Maybe…? Sorry, still very new to OAC.

What I'm really trying to do is get a weighted count. So, each respondent represents x number of people in the total US population based on their demographics. This is their weight, and is already calculated in the dataset. All respondents with the same demographics (age range, sex, etc.) would have the same weight. Each respondent can have many records; one for each hobby, in this case.

If I do a straight COUNT(DISTINCT ID) filtering for SEX="M", then I'll get 2. This is the base count. But, a weighted count would SUM their weight column instead, which gives us 2300 + 1200 = 3400 population in this example.

However, when I add my filters and do a SUM on the weight column, it's aggregating on every row in the dataset where sex="M" instead of excluding those IDs it's already added their weights for, and gives me 2300+2300+1200+1200 = 6800.

Fred Kaffenberger-Oracle · May 23, 2024 10:27PM

This may work. In my testing, it worked with report filters for showing a tile with an amount on it.
Weighted Sum: sum( sum(count(ID) * WEIGHT by HOBBY, SEX, REGION) )

I started with sum(count(ID) * WEIGHT by HOBBY, SEX, REGION). If you put this measure on a table with no attributes, you can see that it is shows 3 distinct values: 1200, 2300, 2400. If you put it on a tile, however, it just shows the first value.

To get it to show the sum of all the values, I wrapped a sum around that. This sums all of the values, not just the distinct ones.

A set of visuals. Table with Hobby, Sex, Region, Weight as columns with all combinations. Weight totals to 11,800. raw calc on tile shows 1200. raw calc on table with no attributes shows 1200, 2300, 7000 (distinct values). Good Calc on table shows only 1 row with value of 11,800. Good Calc on tile shows 11,800.

Sarah Beckwith-Oracle · May 23, 2024 11:14PM

Gianni Ceresa, I think splitting the data like you suggest would make the most sense, logically. At least this way, the ID is always unique and tied to the weight and other demographics.

However, I'll also try the ranking idea to see if I can get that to work.

Sarah Beckwith-Oracle · May 24, 2024 2:15PM

Fred Kaffenberger-Oracle , is this line, (sum(count(ID) * WEIGHT by HOBBY, SEX, REGION)), basically doing an GROUP BY on the ID, HOBBY, SEX, and REGION?

I've tried adding these calculations in, but I must be missing something. I'm not getting the same totals. The sample workbook is attached.

Hobby_test.dva

Uploaded May 24, 202417.92 KB

Fred Kaffenberger-Oracle · May 24, 2024 4:48PM

@Sarah Beckwith-Oracle

Thanks for your dva. I'm new to working with OAC also!

I found my mistake. I didn't have ID set to treat as attribute. As a result, count(ID) came back as 1. When I set to attribute, I got the same results as you did.

This works, however:

Weighted SUM: sum(sum( WEIGHT by HOBBY, SEX, REGION ) )

So, then I looked at your question about the sum by and wondered if I needed that. It turns out that I didn't. 🤯

SUM( WEIGHT)

Let me know if it really works or if I missed something.

Sum by Dim in flat table- explore.dva

Uploaded May 24, 202441.32 KB

Oracle Analytics Cloud and Server

Welcome!

Quick Links

Categories

SUM on DISTINCT value of other column?

Welcome!

Best Answer

Answers

Welcome!