Categories
SUM on DISTINCT value of other column?

I have a dataset that is representative of a general population, and am trying to do weighted counts of the individuals based on some filters.
For example, my dataset would look something like this:
ID WEIGHT SEX REGION HOBBY
2 2300 M NE Basketball
2 2300 M NE Darts
3 1200 M SW Basketball
3 1200 M SW Football
4 2400 F NE Cycling
4 2400 F NE Darts
To get a weighted total of Males, I would add only the weight for each distinct ID where the SEX=M (2300 + 1200 = 3500).
Similarly, to get a weighted total of all those who like darts, I would add only the weight for each distinct ID where HOBBY = "Darts" (2300+2400 = 4700).
In OAC, I only see a COUNT(DISTINCT) option or a SUM(DISTINCT), but those do a distinct on the same column (ie: weight). I can have several different IDs with the same weight, but need to add the weight together for each distinct ID.
How would I do this?
Best Answer
-
The easy solution would be a second dataset where you apply a filter to only have distinct (ID, WEIGHT, SEX, REGION), because that's the real unique values and HOBBY is the one duplicating your rows.
Having 2 dataset would be the cleanest: (ID, WEIGHT, GENDER, REGION) and (ID, HOBBY) and you can join them on the ID = ID. This is what a dimension model would look like for that kind of data.
If you really want to do it in a single dataset you could try a dirty formula, something ranking the records by ID, giving you a unique ranking number starting with 1 for each ID. Then filtering on ranking = 1 before to SUM the measure you need.
Because your datasets repeat a number of columns, you need to add something allowing you to only take 1 row for each ID.
There isn't an existing formula doing all that, because it's a modelling issue that you have to correct…
1
Answers
-
Not sure I completely get your question but would creating calculated column with formula FILTER(<measure> USING HOBBY='Darts') or FILTER(<measure> USING SEX='M'),etc. be an option to get this kind of data?
Other comments welcomed.
Thanks
Gayathri
0 -
Maybe…? Sorry, still very new to OAC.
What I'm really trying to do is get a weighted count. So, each respondent represents x number of people in the total US population based on their demographics. This is their weight, and is already calculated in the dataset. All respondents with the same demographics (age range, sex, etc.) would have the same weight. Each respondent can have many records; one for each hobby, in this case.
If I do a straight COUNT(DISTINCT ID) filtering for SEX="M", then I'll get 2. This is the base count. But, a weighted count would SUM their weight column instead, which gives us 2300 + 1200 = 3400 population in this example.
However, when I add my filters and do a SUM on the weight column, it's aggregating on every row in the dataset where sex="M" instead of excluding those IDs it's already added their weights for, and gives me 2300+2300+1200+1200 = 6800.
0 -
This may work. In my testing, it worked with report filters for showing a tile with an amount on it.
Weighted Sum: sum( sum(count(ID) * WEIGHT by HOBBY, SEX, REGION) )I started with sum(count(ID) * WEIGHT by HOBBY, SEX, REGION). If you put this measure on a table with no attributes, you can see that it is shows 3 distinct values: 1200, 2300, 2400. If you put it on a tile, however, it just shows the first value.
To get it to show the sum of all the values, I wrapped a sum around that. This sums all of the values, not just the distinct ones.1 -
Gianni Ceresa, I think splitting the data like you suggest would make the most sense, logically. At least this way, the ID is always unique and tied to the weight and other demographics.
However, I'll also try the ranking idea to see if I can get that to work.
0 -
Fred Kaffenberger-Oracle , is this line, (sum(count(ID) * WEIGHT by HOBBY, SEX, REGION)), basically doing an GROUP BY on the ID, HOBBY, SEX, and REGION?
I've tried adding these calculations in, but I must be missing something. I'm not getting the same totals. The sample workbook is attached.
1 -
Thanks for your dva. I'm new to working with OAC also!
I found my mistake. I didn't have ID set to treat as attribute. As a result, count(ID) came back as 1. When I set to attribute, I got the same results as you did.This works, however:
Weighted SUM: sum(sum( WEIGHT by HOBBY, SEX, REGION ) )
So, then I looked at your question about the sum by and wondered if I needed that. It turns out that I didn't. 🤯
SUM( WEIGHT)
Let me know if it really works or if I missed something.
1