This discussion is archived
11 Replies Latest reply: Mar 25, 2013 11:14 AM by 781578

# Pearson Correlation of multiple values

Currently Being Moderated
Hey,

i have a problem. I want to calculate the pearson correlation of two sets which have multiple values. The function CORR(x,y) accepts only two numbers. How can i handle that?
Is there an efficient way to do that because i have a lot of sets where i want to calculate the correlation.

I hope you can help me.

bladepit
• ###### 1. Re: Pearson Correlation of multiple values
Currently Being Moderated
Hi,
bladepit wrote:
Hey,

i have a problem. I want to calculate the pearson correlation of two sets which have multiple values. The function CORR(x,y) accepts only two numbers.
CORR accepts two arguments , such as two NUMBER columns. The result that it returns reflects the correlation over a group (or partition ) which typically includes a large number of rows.
How can i handle that?
Is there an efficient way to do that because i have a lot of sets where i want to calculate the correlation.

I hope you can help me.
Post a little sample data (CREATE TABLE and INSERT statements) and the results you want from that data.
See the forum FAQ {message:id=9360003}
• ###### 2. Re: Pearson Correlation of multiple values
Currently Being Moderated
I can't post the table definition or something else but i hope i can explain my table. I thought this select will make it little bit clearer:

select value, ref, bez from table where (bez = 'A' OR bez = 'B') and ref in (200,201,202....) order by ref;

The value is the important information here. The output could be something like that:

value ref bez
0,0251;194;A
0,7063;194;B
1,2265;195;B
0,7063;198;B

Over the amount of the two sets i want to calculate the pearson.

I think if i put them into an tmptable in my database and execute this sql statement:

select CORR(x,y) from tmptable;

i will get my information but i need to do that for more then 2 million different bez and i think this will not be very efficient.
• ###### 3. Re: Pearson Correlation of multiple values
Currently Being Moderated
Hi,
bladepit wrote:
I can't post the table definition
Don't be silly. Of course you can post CREATE TABLE and INSERT statmenents for some sample data. They don't need to have the same names as your real tables and columns, any more than they need to have the same size or the same data as your real tables.
If you can show what you need to do using commonly available tables, such as those in the scott or hr schemas, then you don't need to post any tables of your own. But one way or another, you have to provide sample data to the people who want to help you.
or something else but i hope i can explain my table. I thought this select will make it little bit clearer:

select value, ref, bez from table where (bez = 'A' OR bez = 'B') and ref in (200,201,202....) order by ref;

The value is the important information here. The output could be something like that:

value ref bez
0,0251;194;A
0,7063;194;B
1,2265;195;B
0,7063;198;B

Over the amount of the two sets i want to calculate the pearson.
What are the results you would want from this small sample data set?

I'm sorry I don't know much about your application, or about statistics. This is a SQL and PL/SQL forum. If you happen to find someone here that knows all about your application and about statistics, that's great, but in general, you'll have to explain what you want. With your knowledge of your application and of statistics, and our knowlege of SQL, I'm confident we (the poeple on this forum) can solve this problem, if we work together.
I think if i put them into an tmptable in my database and execute this sql statement:

select CORR(x,y) from tmptable;

i will get my information but i need to do that for more then 2 million different bez
What is a bez? Why is the number of different bez singnificant in this problem?
and i think this will not be very efficient.
Where is the data now?
If it's not in an Oracle database, or someplace that can be read from Oracle, why are you even considering using an Oracle function?
Given that you do want to use an Oracle function, then you have to get the data into Oracle, even if that takes time.
• ###### 4. Re: Pearson Correlation of multiple values
Currently Being Moderated
``````With T1 as (
Select
Value,ref,bez
From table
Where
Bez ='A'
),
T2 as (
Select
Value, ref,bez
From table
Where
bez ='B'
)

Select
Corr(T1.value,t2.value)
Over(Order by T1.ref) cor
From
T1,T2
Wehre
T1.ref=T2.ref``````
Sorry, it is somehow RAW, behause i Worte this in an iPad, what i am Not Use to do.

Edited by: chris227 on 23.03.2013 11:47
• ###### 5. Re: Pearson Correlation of multiple values
Currently Being Moderated
Thank you that worked fine.

I have another question for a problem that i need to solve to use your solution.

I have a table:

CREATE TABLE TABLE
(
BEZ VARCHAR2(13 BYTE) NOT NULL
, REF NUMBER(10, 0) NOT NULL
, REF2 NUMBER(1, 0) NOT NULL
, VALUE NUMBER(10, 4)
, CONSTRAINT PK_K PRIMARY KEY
(
BEZ
, REF
, REF2
)
ENABLE
)

In this table are four million entries. Over all bez i want to generate the distinct pairs of them. These are the input pairs for your solution above.

I have no idea how i can do this with plsql. My first idea was so save all distinct BEZ from an sql in plsql and then generate an sql:

select distinct (a.BEZ), b.BEZ from DISTINCT_SUBSTID_SQL a, DISTINCT_SUBSTID_SQL b

This is only an idea and i thought it will slow down my complete procedure.
• ###### 6. Re: Pearson Correlation of multiple values
Currently Being Moderated
Sorry, i dont really understand which column should be distinct.
But did you try just
``````With T1 as (
Select distinct
Value,ref,bez
From table
Where
Bez ='A'
),

...``````
The distinct will slow it down in every case

Edited by: chris227 on 24.03.2013 12:37
• ###### 7. Re: Pearson Correlation of multiple values
Currently Being Moderated
I think that doesn't match my needs.

I want to get the distinct pairs of BEZ for an specific list of REFs. REF2 needs to be always 1 so that is trivial.

An example of data in table and what i want to get from the select:

BEZ;REF;REF2;VALUE

A 11 1 2
B 22 1 3
C 11 1 2
D 33 0 4

Now i want to get all combinations of BEZ pairs for an specific set of REFs for example for REF 11 and 22:

A,B
A,C
B,C

For four million BEZs this will explode and i know that so i tried to filter out the BEZ bevor i start this step (based on some criteria).
My question is: Can I handle efficiently with PLSQL?
• ###### 8. Re: Pearson Correlation of multiple values
Currently Being Moderated
you mean something like
``````with data as (
select distinct
BEZ,REF,VALUE
from your_table
where
ref in (11,22)
)

select
d1.bez , d2.bez
d1.value, d2.value
from
data d1, data d2``````
• ###### 9. Re: Pearson Correlation of multiple values
Currently Being Moderated
I think this could be the solution for my problem. But if i execute this sql i became the following error:

ORA-01652: Temp-Segment kann nicht um 128 in Tablespace TEMP erweitert werden
01652. 00000 - "unable to extend temp segment by %s in tablespace %s"
*Cause:    Failed to allocate an extent of the required number of blocks for
a temporary segment in the tablespace indicated.
*Action:   Use ALTER TABLESPACE ADD DATAFILE statement to add one or more
files to the tablespace indicated.

I guessed i have no tablespace with name TEMP. If i open my plugin insider view in sqldeveloper i didn't see this tablespace.

Can i execute this sql in another tablespace?
• ###### 10. Re: Pearson Correlation of multiple values
Currently Being Moderated
With this table this is not really surprising, because of the cartesian product.
if the data is like in the example, one might try to reduce the number of rows (aside from extending the temp-space) with something like below to exclude permutations
``````select
d1.bez , d2.bez
d1.value, d2.value
from
data d1, data d2
where
d1.bez <= d2.bez``````
You may try to find the tablespace by starting with
``select * from dba_tablespaces``
There willbe one, read the docs on this or ask your administrator.
• ###### 11. Re: Pearson Correlation of multiple values
Currently Being Moderated
Can i handle for example four million bez with an collection with type like that:

TYPE intstring IS TABLE OF VARCHAR2(64) INDEX BY BINARY_INTEGER

In my example that works. But i am sure if i put four million strings in it it could be something like outofmemory?

Something like that became i if i do the following:
``````  i:=0;
over402 := over40;
str := over40.first;
while str is not null loop
str2 := over402.first;
while str2 is not null loop
if str != str2 AND substidtosubstid.EXISTS(str || ';' || str2) = FALSE AND substidtosubstid.EXISTS(str2 || ';' || str) = FALSE then
substidtosubstid(str || ';' || str2) := 'TEST';
--dbms_output.put_line(str || ' ' || str2);
end if;
str2 := over402.next(str2);
end loop;
str := over40.next(str);
end loop;``````
over40 is an collection with 8767 strings. It throws:
``````04030. 00000 -  "out of process memory when trying to allocate %s bytes (%s,%s)"
*Cause:    Operating system process private memory has been exhausted
*Action:``````
Seems that the operating system under the database reachs end of memory?

Edited by: bladepit on 25.03.2013 11:13

#### Legend

• Correct Answers - 10 points
• Helpful Answers - 5 points