11 Replies Latest reply: Jul 25, 2011 6:55 PM by EJP RSS

    Interesting data structure / algorithm question for mapping of visual items

    877621
      Hi all. I am trying to make a GUI text view of some data. The data is an array of items. The visual view is basically a text view; for simplicity's sake let's say an array of strings. Sounds easy so far, right? You can imagine it like this:

      Object data[];
      String visualRepresentation[];

      Well, I have a problem that gets harder when the amount of data gets much larger. Some items will require more than one line of text for each item of data. So, getLine() doesn't have an easy 1:1 mapping of the index between data[] and visualRepresentation[]. In this case, visualRepresentation is going to be as large or larger than data, because there is at least 1 item in visualRepresentation for every item in data, but maybe more. You've got a GUI that has this method:

      getLine(int index) { return visualRepresentation[index]; }

      imagine the problem of the mapping of visualRepresentation to data. If something changes to where the visual representation changes from 1 to 2 lines, then you have to insert an item in the visualRepresentation array. If you have 1 million items, this is really expensive. Also, imagine that you don't want to create the ENTIRE array of visualRepresentation given that you only need to display 50 items or less on the screen at once.

      So, here's my problem. I don't want to create the entire visualRepresentation array for the entire million (or whatever) items of data, I want to create it on demand. What is the best way to approach this, given that I want a getLine(index) method? A very brute force method would be to count the ENTIRE array from 0 to index on every getLine() call. This would be expensive, and basically impractical. An alternative would be to include a mapping of lines from data to visualRepresentation for each element in data[]. The problem there would be that every insert of a line to visualRepresentation would require an update to every mapping item in all subsequent items. So, if you inserted a line at index 500 in an array of 1000 items, you'd have to update all 500 items after index 500.

      Any ideas?
        • 1. Re: Interesting data structure / algorithm question for mapping of visual items
          Kayaman
          Did you know that there is already a mechanism for creating a String representation for any arbitrary data?

          It's the toString() method from Object class.

          Works like a charm when used with all sorts of table renderers and other GUI things.
          • 2. Re: Interesting data structure / algorithm question for mapping of visual items
            796440
            Don't use parallel arrays. Define a new class. That class has an Object for the data and a String or array of Strings for the visual representation.
            • 3. Re: Interesting data structure / algorithm question for mapping of visual items
              YoungWinston
              user12102468 wrote:
              ...The problem there would be that every insert of a line to visualRepresentation would require an update to every mapping item in all subsequent items. So, if you inserted a line at index 500 in an array of 1000 items, you'd have to update all 500 items after index 500.
              Any ideas?
              Yes, plenty. Rather than storing your lines as a String array, use:
              1. ArrayList<String>
              2. LinkedList<String>
              3. HashMap<Integer, String>
              4. ConcurrentSkipListMap<Integer, String>

              Each has its own advantages and disadvantages, so I suggest you check the docs for each. The last may provide the best mix of retrieval speed, update speed and memory size; but it's heavily targeted at multi-threaded use, which may be overkill for your requirements.

              And in general, I'd also heed jverd's advice and define your own class for this. DatumWithVisualRepresentation seems quite a mouthful though, so you might want to come up with a better name.

              Winston
              • 4. Re: Interesting data structure / algorithm question for mapping of visual items
                877621
                Thanks for your responses guys. First of all, I dumbed it down so that I could explain the problem properly. Of course I'm not using String arrays or separate arrays and no I'm not using these names either. The problem is I have a text view, and the text view calls getLine(int index) so I have to access the text in a rather quick fashion, preferably O(1). Here's something like what I have:

                class MyObject
                {
                Object data; // always 1 of these
                ArrayList<String> text; // 0:n lines of text
                }

                ArrayList<MyObject> biglist;

                Here, biglist is expected to be hundreds of thousands of items. Now, it's obviously easy to access biglist by their index, in fact it's O(1). However, imagine the problem of only referring to lines of text. If I wanted to get the 1000th line of text, I can't access biglist.get(1000), because there isn't 1 line of text for every item in biglist. A dumb way to do so would be to:

                String getLine(int index)
                {
                int curLine = 0;
                for (MyObject o : biglist)
                {
                if (index < o.text.size()) return o.text.get(index);
                index -= o.text.size();
                }
                }

                However, that's O(n) to access the nth line of text. Another way to go about this is to separate the text from the data, i.e.:

                ArrayList<MyObject> data; // now MyObject would have an index which points to its representative index in the text array.
                ArrayList<String> text;

                The problem with that solution is that, if the text grows or shrinks for an individual piece of data, it's expensive to insert/delete because you have to move the entire array back and forth. I'm trying to wrap my head around a more elegant solution which can account for both of these problems: the text has to be accessible quickly, and the text can change. In the first solution, you can't even do a binary search to make it O(log n) because you can't start in the middle of the list and know where you are. Or, if you did (by adding a field which represents which text index it is), you still have the problem of insertion/deletion which would require all of those fields to be updated.

                Even more solutions off the top of my head: initialize the array where each text line has an empty space following it, that way you can insert one time and not have to move everything (but you'll have to do so if you insert 2). The indexes would initially point to 0, 2, 4, etc. However, this method requires twice as much memory, and remember we're dealing with hundreds of thousands of items.

                I think I'm going to have to have some sort of a compromise between the ability to access text linearly, and the ability to insert/delete. Before I go decide on a solution, though, I was wondering if anyone out there had some good ideas on this.

                Thanks!
                • 5. Re: Interesting data structure / algorithm question for mapping of visual items
                  801313
                  Maybe use the master / detail view concept to just display 1 line on the master page.
                  • 6. Re: Interesting data structure / algorithm question for mapping of visual items
                    EJP
                    The problem is I have a text view, and the text view calls getLine(int index)
                    It seems to me that this is the heart of the problem. Why do you need this feature? Maybe you should just let the user find the MyObject instance that he wants and fight it out from there.

                    Otherwise you're into a forest of self-updating offset counts per object, O(N squared) algorithms, etc.
                    • 7. Re: Interesting data structure / algorithm question for mapping of visual items
                      877621
                      EJP wrote:
                      The problem is I have a text view, and the text view calls getLine(int index)
                      It seems to me that this is the heart of the problem. Why do you need this feature?
                      This is an interesting thought. Since I have a view over hundreds of thousands of objects, I created a "cache" in my text viewer. It is a simple array of 1000 (or whatever amount) objects. When the user scrolls past the currently loaded 1000 text lines, the array is nulled out.

                           /** The index of the real list of lines to which {@code mLines[0]} refers to. */
                           private int mLinesStart;

                           /** This is the cache of lines, which is loaded on demand. See {@link #mLinesStart}. */
                           private final TextLine[] mLines = new TextLine[1000];

                           private TextLine getLine(int i)
                           {
                                int targetIndex = i - mLinesStart;
                                if (targetIndex < 0 || targetIndex >= mLines.length)
                                {
                                     targetIndex = i - (mLines.length / 2); // previous 500 to next 500
                                     if (targetIndex < 0) targetIndex = 0;
                                     resetLineCache(targetIndex); // null existing array
                                     targetIndex = i - mLinesStart;
                                }

                                if (mLines[targetIndex] == null)
                                     mLines[targetIndex] = mTextPanelData.getLine(i); // only "load" TextLine until needed

                                return mLines[targetIndex];
                           }


                      So, finding a line doesn't happen in real time. It happens once every 1000 line views (or 10000 or whatever). Given that, it seems reasonable that the getLine(index) feature doesn't have to be O(1). Instead, finding the line could be O(log n), and then traversing the next 1000 lines is O(1) because they are already sorted.
                      Otherwise you're into a forest of self-updating offset counts per object, O(N squared) algorithms, etc.
                      I'm really trying to avoid that! Honestly, though, I'm not sure that it is possible to get around having a sacrifice of one problem or another. If I want to access in O(1) time, I can use an array but sacrifice insertions/deletions for O(N). If I can find an interesting way to store the text, I may be able to get access in O(log n) time and insertions/deletions in O(1). Right now I'm looking into the skip list idea... at first glance, it seems like it is made for exactly this type of problem.
                      • 8. Re: Interesting data structure / algorithm question for mapping of visual items
                        EJP
                        I was thinking of another item in MyObject that would be the sum of the line counts in all the prior items. You would have to update on average N/2 of those when you inserted or deleted a line, i.e. in every item from that line up. Then you could binary-chop on this item to find which array index contains the desired line index, which gives you your O(log N), then go O(1) from there inside the current item's list, subtracting the cumulative total from the required index of course. The downside of this is the O(N) performance on adding and deleting lines.
                        • 9. Re: Interesting data structure / algorithm question for mapping of visual items
                          YoungWinston
                          user12102468 wrote:
                          This is an interesting thought. Since I have a view over hundreds of thousands of objects, I created a "cache" in my text viewer. It is a simple array of 1000 (or whatever amount) objects. When the user scrolls past the currently loaded 1000 text lines, the array is nulled out.
                          OK, so as I see it, you have a huge array of MyObjects somewhere, and you want to build a windowing cache that presents a thousand (or whatever) number of them at a time for a screen display.

                          1. Is there any reason why you need a thousand? Couldn't you just retrieve sufficient objects to display a screenful? It doesn't alter your "indexing" issue, but it'll make the build process a awful lot quicker. Furthermore, it may reduce the cache size to a point where you're not worried about O(n) performance - the time to search through fifty or sixty items isn't likely to be markedly different whether you use brute force or a binary chop.

                          2. I think you may be worrying overly about insert performance. Don't forget that an ArrayList is only dealing with an array of references to Strings, not the strings themselves; so an insert to a 1,000-line ArrayList is only ever likely to be copying a maximum of about 4K (possibly 8) of data; and it'll do that blisteringly quickly.
                          If a single MyObject could have a 1,000,000-line list, it might be worth worrying about; but your examples so far tend to suggest not.

                          Otherwise, if you're intent on a large cache, EJP's suggestions are probably the way I'd go too.
                          The only difference I think I'd make is that rather than adding the 'cumulative count' to MyObject itself, I'd add it to a subclass that you use only for the cache (or hang two classes off a MyObject interface). After all, it only has relevance to the cache.

                          An alternative, for a small cache, might be to simply have a cache of lines, with each 'line' containing it's object and line index; and rebuild it after an insert.

                          Winston

                          Edited by: YoungWinston on Jul 25, 2011 11:29 AM
                          • 10. Re: Interesting data structure / algorithm question for mapping of visual items
                            877621
                            Okay guys, first of all THANKS for all the great conversation. I've settled on how I'm going to handle the data.

                            I am not going to keep the text representation in the same class as the data it represents. Instead, I've created a TextTable class which holds all of the information. If I need to move this information to persistence because a million items takes up too much memory, it would be rather easy and I would just keep the text count in memory so it's still quick to access where the text should be (indices) if the user scrolls to line 500,000 with the scroll bar, but load the text on demand.

                            This TextTable class holds an ArrayList of TextItems. There is a 1:1 correspondance between a datum and a TextItem. The TextItem contains an ArrayList of text and a visual index as well as an ID (which is sorted in the same fashion that the data is).

                            The result is that it will be O(1) to append new items to the TextTable. It will be O(n) to insert/delete items, but even with 100000 pieces of data this is 1/100th of a second. When inserting a line of text, each TextItem in the TextTable after the index has to have it's visual index changed by the difference (i.e. add 1 to each visualIndex after 50 if we insert a line at 50).

                            Finally, the big compromise I made was that it is O(log n) to retrieve a text by its visual index and not O(1). However, since I grab a big chunk of text lines at the same time, it is O(log n) to find the right visual index and then O(50) to retrieve the next 50 items through iteration, which is really O(1). So, the amortized time for retrieving text for the visual display is less than O(log n).

                            Thanks!
                            • 11. Re: Interesting data structure / algorithm question for mapping of visual items
                              EJP
                              Sounds reasonable. O(log N) binary search over a million items is only 20 chops so it's not too different from O(1) in practice.