1 Reply Latest reply on Mar 7, 2010 1:31 PM by 843853

# Creating a KD Tree and only sorting the points once for each axis

So right now I have some code for creating a KD Tree (2-Dimensions) but I am sorting the x and y values for the node each time I create a new node. Then I am taking the median, setting that as the nodes point, and passing on the left half and right halves of the remaining array to their respective children. I am using Arrays.Sort which I believe takes O(n log(n)) time per sort (once per node), but I have been reading a lot of stuff that states you can get the median in linear time so that the total build time is O(n log(n)). Unfortunately I am having difficulty understanding how. Here is a snippet of an explanation I found.
Each node has both x-list (ordered in x-coord.) and y-list (ordered in y-coord.)
Each pi in x-list and y-list is linked to each other.
Both x-list and y-list can be built in linear time using the two lists of its parent.

Thus total time to build all x-lists and y-lists is:
T1 (n) = O(n) + 2T1 (n/2) = O(n log n).

Since each splitting line can be found in O(1) time (via sorted lists), and there are n &#8722; 1 splitting lines in a kd-tree,
Thus total time to find all splitting lines is
T2 (n) = (n &#8722; 1) · O(1) = O(n).

Hence, total prepossessing time
= T1 (n) + T2 (n) = O(n log n).
My question is, how can I build the arrays in linear time at each node? I have tried working it out on paper for the past few hours but lets say I sort the array using X first and I know that indices 0 - median will be on the left, however even if I had sorted the array by Y also, there is zero guarantee that all the points I need from x[0-median] will be in y[0-median]. They will obviously be included in x[0-median] but they will be sorted by x and not y, forcing me to resort at every single node.

Any idea?
• ###### 1. Re: Creating a KD Tree and only sorting the points once for each axis
swinte1 wrote:
My question is, how can I build the arrays in linear time at each node? I have tried working it out on paper for the past few hours but lets say I sort the array using X first and I know that indices 0 - median will be on the left, however even if I had sorted the array by Y also, there is zero guarantee that all the points I need from x[0-median] will be in y[0-median]. They will obviously be included in x[0-median] but they will be sorted by x and not y, forcing me to resort at every single node.

Any idea?
So for example, you have 5 points and you sort them by x and y coordinates in two separate arrays:
x-list: 6 1 4 3 2 5
y-list: 6 5 1 2 3 4

Now you want just the points to the right of 4:
x-list: 3 2 5
y-list: 5 2 3

Create the x-list by iterating from the midpoint to the end.
Add each element in the x-list to a set..
Iterate through the y-list from left to right:
If the point in the y-list is contained in the set, add it to the new y-list.

This creates the arrays in linear time while keeping the points sorted. It does iterate over twice as many points in the y-list as needed though.