1 Reply Latest reply: Mar 7, 2010 7:31 AM by 843853 RSS

    Creating a KD Tree and only sorting the points once for each axis

    843853
      So right now I have some code for creating a KD Tree (2-Dimensions) but I am sorting the x and y values for the node each time I create a new node. Then I am taking the median, setting that as the nodes point, and passing on the left half and right halves of the remaining array to their respective children. I am using Arrays.Sort which I believe takes O(n log(n)) time per sort (once per node), but I have been reading a lot of stuff that states you can get the median in linear time so that the total build time is O(n log(n)). Unfortunately I am having difficulty understanding how. Here is a snippet of an explanation I found.
      Each node has both x-list (ordered in x-coord.) and y-list (ordered in y-coord.)
      Each pi in x-list and y-list is linked to each other.
      Both x-list and y-list can be built in linear time using the two lists of its parent.
      
      Thus total time to build all x-lists and y-lists is:
      T1 (n) = O(n) + 2T1 (n/2) = O(n log n).
      
      Since each splitting line can be found in O(1) time (via sorted lists), and there are n − 1 splitting lines in a kd-tree,
      Thus total time to find all splitting lines is
      T2 (n) = (n − 1) · O(1) = O(n).
      
      Hence, total prepossessing time
      = T1 (n) + T2 (n) = O(n log n).
      My question is, how can I build the arrays in linear time at each node? I have tried working it out on paper for the past few hours but lets say I sort the array using X first and I know that indices 0 - median will be on the left, however even if I had sorted the array by Y also, there is zero guarantee that all the points I need from x[0-median] will be in y[0-median]. They will obviously be included in x[0-median] but they will be sorted by x and not y, forcing me to resort at every single node.

      Any idea?
        • 1. Re: Creating a KD Tree and only sorting the points once for each axis
          843853
          swinte1 wrote:
          My question is, how can I build the arrays in linear time at each node? I have tried working it out on paper for the past few hours but lets say I sort the array using X first and I know that indices 0 - median will be on the left, however even if I had sorted the array by Y also, there is zero guarantee that all the points I need from x[0-median] will be in y[0-median]. They will obviously be included in x[0-median] but they will be sorted by x and not y, forcing me to resort at every single node.

          Any idea?
          So for example, you have 5 points and you sort them by x and y coordinates in two separate arrays:
          x-list: 6 1 4 3 2 5
          y-list: 6 5 1 2 3 4

          Now you want just the points to the right of 4:
          x-list: 3 2 5
          y-list: 5 2 3

          Create the x-list by iterating from the midpoint to the end.
          Add each element in the x-list to a set..
          Iterate through the y-list from left to right:
          If the point in the y-list is contained in the set, add it to the new y-list.

          This creates the arrays in linear time while keeping the points sorted. It does iterate over twice as many points in the y-list as needed though.