Lazy Loading is Easy: Implementing a Rich Domain Model Blog

Version 2

    {cs.r.title}



    Lazy loading: An object that doesn't contain all of the data you need but knows how to get it.(Martin Fowler, Patterns of Enterprise Application Architecture)

    I have recently explored the wonderful world of lazy loading. The original title of this article was going to be "Lazy Loading Made Easy." It originally started out as an exercise to try something complicated, but as it turns out, lazy loading alreadyis easy.

    First off: why would you want to do lazy loading? The most important reason is to be able to have a clean domain model. Consider the (pseudo) class Category:

    public class Category { public Collection<Category> subcategories = new HashSet<Category>(); public Category parent; public String name; public int creationYear; }
    

    With this class, I would be able to say stuff like:

    Category category = dao.get(1); System.out.println(category.parent.subcategories);
    

    This should print all the siblings of category 1(whatever that means). However, without lazy loading, I would have to do this:

    Category category = dao.get(1); Category parent = dao.get(category.parentId); List children = new ArrayList(); for (Long childId : parent.subcategoryIds) { children.add(dao.get(childId)); } System.out.println(children);
    

    This might not look so bad in this simple example, but it adds up. In addition, look at all the places where we need the DAO! In effect, this rips the whole business logic out of the domain objects, as they normally cannot access the DAOs directly. If this code looks unpleasantly familiar, this article is for you.

    In order to do the first, easy example efficiently, we have to support lazy loading.

    Smoke Testing the DAO

    I will start out by writing a simple test to demonstrate the implementation of a basic DAO. We will then expand this DAO to support lazy loading. We start with a simple test for saving and retrieving objects:

    public class CategoryDaoTest extends TestCase { public void testSaveRetrieve() throws Exception { Category parent = new Category("parent", 2005); CategoryDao categoryDao = new SpaceCategoryDao(); categoryDao.store(parent); Category saved = categoryDao.get(parent.getId()); assertNotSame(saved, parent); assertEquals("name", saved.getName(), parent.getName()); } }
    

    This only verifies that saving and retrieving objects without lazy loading works correctly. I have created two implementations of the CategoryDAO: one for JDBC and one using a simple internal structure (I call this a space, because it is vaguely inspired by JavaSpaces). I show this example using the simpler spaces structure. For the JDBC implementation, see the downloadable source. Here are theget() and store() methods ofSpaceCategoryDao:

    public Category get(Long id) { if (!idToTuples.containsKey(id)) { return null; } Serializable[] tuple = (Serializable[]) idToTuples.get(id); return tupleToObject(tuple); }
    
    public Long store(Category category) { if (category == null) { return null; } if (idToTuples.containsKey(category.getId())) { return category.getId(); } store(category.getParent()); category.setId(new Long(nextId++)); Serializable[] tuple = objectToTuple(category); idToTuples.put(category.getId(), tuple); for (Iterator iter = category.getSubcategories().iterator(); iter.hasNext();) { store((Category) iter.next()); } return category.getId(); }
    

    The objects are saved by using objectToTuple and restored by tupleToObject:

    private Serializable[] objectToTuple(Category category) { Long parentId = category.getParent() != null ? category.getParent().getId() : null; return new Serializable[] { category.getId(), category.getName(), new Integer(category.getCreationYear()), parentId }; } private Category tupleToObject(Serializable[] tuple) { Category category = new Category((String)tuple[1], ((Integer)tuple[2]).intValue()); category.setId((Long) tuple[0]); return category; }
    

    We now have basic storing and loading of a single object implemented. Using test-driven development, I know that it basically works. We are now set up for solving the problem of lazy loading. First, I will show how we load the related objects without lazy loading, and then I will improve the implementation to load the parent relationship lazily.

    Lazy Loading Interfaces

    The code above does not deal with the relationships, but the test passes, so I write a new test:

    public void testCompareParent() { Category saved = categoryDao.get(category.getId()); assertNotNull("parent", saved.getParent()); assertEquals("name", saved.getParent().getName(), category.getParent().getName()); }
    

    Note that I have am now putting the category as an instance variable in the test. This test fails, of course. Let's see how we can make it pass.

    For a first attempt, we use an eager loading approach.

    private Category tupleToObject(Serializable[] tuple) { Category category = new Category((String)tuple[1], (YearMonthDay) tuple[2]); category.setId((Long) tuple[0]); category.setParent(get((Long) tuple[3])); return category; }
    

    This works, but it doesn't do lazy loading. However, we can replace it by a dynamic proxy. In order to do so, I have to extract an interface (called CategoryItf) to theCategory class that contains those methods that I want to use in the parent. This is a bit of a hassle, but as we shall see later, we can simplify this further. Here is the new implementation of tupleToObject

    private Category tupleToObject(Serializable[] tuple) { Category category = new Category((String)tuple[1], (YearMonthDay) tuple[2]); category.setId((Long) tuple[0]); category.setParent(
    lazyGet((Long) tuple[3])); return category; }
    

    Okay, I am pulling your leg. Here is the implementation oflazyGet:

    protected CategoryItf lazyGet(Long id) { if (id == null) { return null; } return (CategoryItf)Proxy.newProxyInstance( CategoryItf.class.getClassLoader(), new Class[] { CategoryItf.class }, new LazyLoadedObject() { protected Object loadObject() { return get(id); } }); }
    

    In order to understand this code, you have to understanddynamic proxies, which were introduced in Java 1.3. Thejava.lang.reflect.Proxy.newInstance() method will return a dynamically generated object that implements the interfaces given to the method call (in this caseCategoryItf), and calls an invocation handlerno matter what method is called on this interface. The code passes in an anonymous subclass of a custom class namedLazyLoadedObject. Here is theLazyLoadedObject invocation handler:

    public abstract class LazyLoadedObject implements InvocationHandler { private Object target; public Object invoke(Object proxy, Method method, Object[] args) throws Throwable { if (target == null) { target = loadObject(); } return method.invoke(target, args); } protected abstract Object loadObject(); }
    

    This makes the lazy loading work, but we have to implement an extra interface just for the dynamic proxy. Let's see if we can do it the way the persistence frameworks do it: with bytecode generation.

    Lazy Loading Classes

    In order to lazy load classes, we will need to use the cglibbytecode manipulation framework, or a similar library.cglib conveniently has an interface to be used for lazy loading, so this code is extremely short.

    private Category lazyGet(final Long key) { if (key == null) { return null; } return (Category) Enhancer.create( Category.class, new LazyLoader() { public Object loadObject() throws Exception { return get(key); } }); }
    

    Enhancer and LazyLoader are classes from cglib. So this code contains almost no code that we did not need to express our business problem: what to do when the parent reference is accessed.

    I have now implemented the simple relationship between a category and its parent category using lazy loading. Next, let's look at collections.

    Lazy Loading Collections

    I still have to show you, gentle reader, how to implement lazy loading for collections. After that, I will show how to make sure that category == category.getSubcategory(0).getParent(). (This is calledreferential integrity.) After referential integrity is in place, we will examine how to create a paged lazily loaded collection: As we iterate through the collection, we will load and unload objects as they are needed.

    Like before, we start off with a unit test. The unit test introduces some new test data, but the critical part looks like this:

    public void testCompareChildren() { Category parentSaved = categoryDao.get(parent.getId()); Set expectedNames = new HashSet(Arrays.asList( new String[] { "sibling", "category" })); Set actualNames = new HashSet(); for (Iterator iter = parentSaved.getSubcategories().iterator(); iter.hasNext();) { Category subcategory = (Category) iter.next(); actualNames.add(subcategory.getName()); } assertEquals("children", expectedNames, actualNames); }
    

    This bit of logic ends up comparing the names of the children ofparent to the expected names of siblingand category. Again, it's good to start by implementing this without using lazy loading, like so:

    private Category tupleToObject(Serializable[] tuple) { Category category = new Category((String)tuple[1], (YearMonthDay) tuple[2]); category.setId((Long) tuple[0]); category.setParent(lazyGet((Long) tuple[3])); category.setSubcategories(
    findByParentId(category.getId())); return category; } 
    public Collection findByParentId(Long parentId) { ArrayList result = new ArrayList(); for (Iterator iter = getChildIdListFor(parentId).iterator(); iter.hasNext();) { Long element = (Long) iter.next(); result.add(get(element)); } return result; }
    

    You should try this out and make sure it works, before doing the lazy loading. The findByParentId method adds an index. We can update this index in the store method:

    public Long store(Category category) { if (category == null) { return null; } if (idToTuples.containsKey(category.getId())) { return category.getId(); } store(category.getParent()); category.setId(new Long(nextId++)); Serializable[] tuple = objectToTuple(category); idToTuples.put(category.getId(), tuple); 
    if (category.getParent() != null) { getChildIdListFor(category.getParent().getId()). add(category.getId()); }  for (Iterator iter = category.getSubcategories().iterator(); iter.hasNext();) { store((Category) iter.next()); } return category.getId(); } private Collection getChildIdListFor(Long parentId) { if (!parentIdToIdList.containsKey(parentId)) { parentIdToIdList.put(parentId, new ArrayList()); } return (Collection) parentIdToIdList.get(parentId); }
    

    This should be pretty simple if you're used to creating DAOs (except maybe for the index). Now, let's add lazy loading:

    private Category tupleToObject(Serializable[] tuple) { Category category = new Category((String)tuple[1], (YearMonthDay) tuple[2]); category.setId((Long) tuple[0]); category.setParent(lazyGet((Long) tuple[3])); 
    category.setSubcategories(lazyFindByParentId(category.getId())); return category; } private Collection lazyFindByParentId(final Long parentId) { LazyLoadedObject lazySubcategories = new LazyLoadedObject() { protected Object loadObject() { return findByParentId(parentId); } }; return (Collection) Proxy.newProxyInstance( Collection.class.getClassLoader(), new Class[] { Collection.class }, lazySubcategories); }
    

    That was simple enough. The LazyLoadedObjectinvocation handler I constructed for dealing with interfaces is a perfect match for what we're doing here. It's considered a good thing to use interfaces when dealing with collections, so I am pretty happy with this code. There is no need to do bytecode instrumentation, as with the lazily loaded simple relationship.

    Referential Integrity

    The following tests illustrate a problem with the current implementation:

    public void testReferenceIntegrity() { Category saved1 = categoryDao.get(category.getId()); Category saved2 = categoryDao.get(category.getId()); assertSame("multiply loaded objects should be the same", saved1, saved2); } public void testCollectionReferenceIntegrity() { Category saved = categoryDao.get(category.getId()); Category savedSub = (Category) saved.getSubcategories().iterator().next(); assertSame(savedSub.getParent(), saved); }
    

    Both of these tests fail. This is a real problem: we have two copies of the parent Category. If we change one, they will get out of sync. It is important that the same object is returned in both cases.

    In order to implement referential integrity, we implement a session cache. This is a Map instance variable on the DAO. Here are the updated store, get andlazyGet methods:

    public Long store(Category category) { if (category == null) { return null; } if (idToTuples.containsKey(category.getId())) { return category.getId(); } store(category.getParent()); category.setId(new Long(nextId++)); Serializable[] tuple = objectToTuple(category); idToTuples.put(category.getId(), tuple); 
    sessionCache.put(category.getId(), category); if (category.getParent() != null) { getChildIdListFor(category.getParent().getId()). add(category.getId()); } for (Iterator iter = category.getSubcategories().iterator(); iter.hasNext();) { store((Category) iter.next()); } return category.getId(); } public Category get(Long id) { if (!idToTuples.containsKey(id)) { return null; } 
    if (!sessionCache.containsKey(id)) { Serializable[] tuple = (Serializable[]) idToTuples.get(id); sessionCache.put(id, tupleToObject(tuple)); } return (Category) sessionCache.get(id);  } private Category lazyGet(final Long key) { if (key == null) { return null; } 
    if (sessionCache.containsKey(key)) { return (Category) sessionCache.get(key); }  return (Category) Enhancer.create(Category.class, new LazyLoader() { public Object loadObject() throws Exception { return get(key); } }); }
    

    The tests pass, and we're done.

    Implementing a Lazy JDBC DAO

    As I have created the SpacesCategoryDAO, I have been creating an CategoryDAO interface:

    /** * Save and retrieve Categories. */ public interface CategoryDao { /** * After this is called, get with the return * value should always return an object * identical to the argument. */ Long store(Category category); /** * Returns a Category with the argument id. */ Category get(Long id); /** * Returns a subset of the subcategories of * the category with the specified id. */ Collection findByParentId(Long parentId, int offset, int length); /** * Returns the count of subcategories of the * category with the specified id. */ int countByParent(Long parentId); /** * After this is called, the schema should * be created in the underlying data store * (if appropriate). */ void initialize(); /** * After this is called, no previously * constructed object shall be returned from * get. */ void clearSessionCache(); }
    

    I will not cover implementing this interface for JDBC in detail, but I the downloadable source includes the source forJdbcCategoryDao. It is quite prosaic. I will use this interface to implement paged lazy loading in the final part of the article.

    Paged Lazy Loading

    I have showed that creating a lazily loaded collection is just as easy as creating a lazily loaded relationship. In order to make lazy loading work well, I have implemented a session cache for storing objects. This introduces a few issues with regard to performance: if we have too many objects in the cache, we can run out of memory. This is normally mostly an issue for one-to-really-many collections, and most object-relation mapping tools ignore it beyond providing a way to remove objects from the cache. This is to support what I would like to callone-to-very-very-many relationships.

    There are several issues with one-to-very-very-many relationships. One problem that can be solved is that of memory consumption. In the systems I am currently working on, we have one-to-many relationships that can contain tens of thousands to a million objects. Loading all of these up when the first is needed might not be what you want. In order to fix this problem, I will use a strategy of paging: only a subset of the objects will be held in memory at the same time. For the point of illustration, I will use a page size of 5 (which is pointless in the real world, but it makes it easier to illustrate the technique).

    public void testPagedObjects() { Category saved = categoryDao.get(largeCategory.getId()); Collection savedChildren = saved.getSubcategories(); Iterator iter = savedChildren.iterator(); assertLoadedCount(0, largeCategory.getSubcategories()); assertEquals("child[0].name", "subcategory 0", ((Category)iter.next()).getName()); assertLoadedCount(5, largeCategory.getSubcategories()); }
    

    In this code, largeCategory is a testCategory object with 20 subcategories (initialized in the Test setUp). The methodassertLoadedCount verifies that exactly the specified number of objects from the collection have been loaded.

    private void assertLoadedCount(int expectedLoaded, Collection subcategories) { int actuallyLoaded = 0; for (Iterator iter = subcategories.iterator(); iter.hasNext();) { Category element = (Category) iter.next(); if (categoryDao.isLoaded(element.getId())) { actuallyLoaded++; } } assertEquals("loaded subcategories", expectedLoaded, actuallyLoaded); }
    

    In order to support this test, I have expanded theCategoryDAO interface with the method boolean CategoryDAO.isLoaded(Category) to check whether an object has been inserted in the session cache. This method can also be used to improve the other lazy-loading tests.

    Here's a paged implementation for theSpaceCategoryDAO:

    protected Collection lazyFindByParentId(final Long parentId) { return new PagedLazyCollection(countByParent(parentId)) { public Collection getSubCollection(int offset, int pageSize) { return findByParentId(parentId, offset, pageSize); } }; }
    

    I have introduced several new concepts. First,CategoryDAO now supports acountByParentId method, so we can implementjava.util.Collection.size without loading all of the objects. Second, CategoryDAO.findByParentId has to be able to support paging. The implementation of these methods is trivial, and in the SpaceCategoryDAO doesn't really work. So instead, I will show you a typical JDBC implementation:

    public int countByParent(Long parentId) { String sql = "SELECT count(*) FROM category WHERE parent = ?"; return getJdbcTemplate().queryForInt(sql, new Object[] { parentId }); } public Collection findByParentId(Long parentId, int offset, int length) { return getJdbcTemplate().query( "SELECT * FROM category WHERE parent = ? LIMIT ? OFFSET ?", new Object[] { parentId, new Integer(length), new Integer(offset) }, new RowMapper() { 
    // code to create a Category from a Recordset row omitted for brevity }); }
    

    Finally, I have introduced a PagedLazyCollectionclass:

    public abstract class PagedLazyCollection extends AbstractCollection { private class PagedLazyCollectionIterator implements Iterator { // .... } private final int size; public PagedLazyCollection(int size) { this.size = size; } public Iterator iterator() { return new PagedLazyCollectionIterator(); } public int size() { return size; } public abstract Collection loadPage(int offset, int pageSize); }
    

    All the magic happens in thePagedLazyInterceptor:

    private class PagedLazyCollectionIterator implements Iterator { private final int pageSize = 5; private int offset = 0; // Trick to avoid null checks private Collection currentCollecton = new ArrayList(); private Iterator currentIterator = currentCollecton.iterator(); public boolean hasNext() { boolean onLastPage = offset >= size; return currentIterator.hasNext() || !onLastPage; } public Object next() { if (!currentIterator.hasNext()) { nextIterator(); } return currentIterator.next(); } 
    private void nextIterator() { currentCollecton = loadPage(offset, pageSize); offset += currentCollecton.size(); this.currentIterator = currentCollecton.iterator(); }  public void remove() { throw new UnsupportedOperationException("remove not implemented"); } }
    

    The interesting bit is inPagedLazyCollectionIterator.nextIterator, where we load the next page. This code should be sufficient to get the test to pass.

    Unloading Pages

    This code, as it stands, will not help much with preserving memory. Even though we wait with reading a page until it is needed, we never throw pages out again! Here's a test that illustrates the problem:

    public void testPagedCollectionPagesOutUnusedObjects() { Category saved = categoryDao.get(largeCategory.getId()); Collection savedChildren = saved.getSubcategories(); Iterator iter = savedChildren.iterator(); while (iter.hasNext()) iter.next(); assertLoadedCount(5, largeCategory.getSubcategories()); }
    

    This test will fail, as the whole set of subcategories oflargeCategory will be loaded into the session cache at this point. In order to fix it, the Iterator has to be able to unload pages as well:

    public abstract class PagedLazyCollection extends AbstractCollection { private class PagedLazyCollectionIterator implements Iterator { // .... private void nextIterator() { unloadPage(currentCollecton); // .... } // ... } public abstract void unloadPage(Collection collection); }
    

    So SpaceCategoryDAO.lazyFindByParentId has to be expanded to tell the collection how to unload objects:

    protected Collection lazyFindByParentId(final Long parentId) { return new PagedLazyCollection(countByParent(parentId)) { public Collection loadPage(int offset, int pageSize) { return findByParentId(parentId, offset, pageSize); } public void unloadPage(Collection collection) { unloadFromSessionCache(collection); } }; }
    

    SpaceCategoryDAO.unloadFromSessionCache could hardly be easier:

    public void unloadFromSessionCache(Collection categories) { for (Iterator iter = categories.iterator(); iter.hasNext();) { Category element = (Category) iter.next(); sessionCache.remove(element.getId()); } }
    

    At this point, we have actually implemented a lazily loaded collection that pages objects in and out as we iterate over it. The amount of code needed is surprisingly small.

    Limitations

    The code as I have shown has several quite severe limitations. I will explain the strategy for solving these limitations, but if you want to see a full implementation, you have to nag me to write a follow-up article.

    • Adding objects to the collection: If we add or remove objects from Category.getSubcategories, the changes will have inconsistent effects. First, the modification will not happen through the category at all, but only by modifying the parent category of the subcategories in question. This will be reflected when we search for new pages in the collection. If the modification changes something having to do with the paging, things will be bad. We might get the wrong objects in the next page. The simplest way to solve modifications to the collection involve introducing constraints. For example: if we could introduce an immutable listIndex field in the subcategory, we would know that all new subcategories will be added at the end.
    • Removing objects from the collection:Supporting remove can be even simpler: just keep a collection of removed objects in memory, and skip these objects when iterating. Neither of these strategies will work perfectly in all situations. For example, if you have an operation that will remove most of the collection, keeping the removed objects in memory is not such a good idea. In the simple case, we can ignore this problem. In the not-so-simple cases, more research is needed.
    • Reference integrity: If an object in a subcategory collection is also used from other places in the code, it will accidentally be thrown out of the session cache when the collection iterates over it. This is a bona fide bug, but it is actually not too hard to solve (use two types of session cache), and the code is pretty uninteresting, so I will leave that as an exercise for the reader, if you really need it. Or you can nag me to fix it for you.
    • Intelligent iteration: There are several things we might want to do with the collection in order to use it better. For example, we really need to be able to skip when iterating. Sadly, neither Iterator norListIterator supports this. To avoid expanding the standard Java collection interfaces, I have not implemented my own skip method. Doing so is easy, though. Alternatively, we could implement lazy behavior inPagedLazyIterator.next.
    • Sorting and subqueries: If you want to sort the subcategories or search for a subset of them, the current structure does not support this well. There are two good approaches to follow. The most straightforward approach is to just use custom methods on the DAO (likefindByParentIdAndNameOrderedByName()) and return paged collections. In this case, we have abandoned the domain model. If you want to keep the domain model, you need to use something more intelligent than SQL to query your objects. Hibernate criteria or TopLink expressions provide object libraries for queries. Use decorators on top of the collection to modify the criteria or expressions.

    Conclusion

    Implementing a lazily loaded, rich domain model isn't just possible, it is practically feasible. Even really big collections can be dealt with. The simple case is surprisingly simple, and depending upon your requirements, even the complex cases are manageable.

    In this article, I have showed how you can construct a lazily loaded domain model that will support reference integrity, collections, and paged loading of very large collections. This is surprisingly easy to do. So the question is: should we abandon third party object-relation mapping tools like Hibernate, TopLink, and JDO and just write our domain model ourselves?

    Of course not. If you look at the full JDBC implementation in the downloadable source, you can see that there are a lot of concerns I had to deal with that an ORM would do for me. Creating the queries and mapping the results to the objects is a tedious and error-prone job, and I don't recommend doing it yourself, even with a JDBC framework like Spring-JDBC. Secondly, even though there isn't that much code involved in writing a lazily loaded collection, it is easy enough to get it wrong. Leaving the job to a third party is a good idea. I trust Ted Neward when he says that ORM is the Vietnam of computer science.

    However, the ORMs available today do not even attempt to solve the one-to-very-very-many relationship. You can use the approach I showed with ORMs just as easily as with JDBC. If you create your own TopLinkCategoryDAO, use TopLink as normal, but instead of mapping subcategory, add a subcategory paged collection to the newly created objects. Maybe your domain model has a file name that references a file on disk. Why not make it into a lazy relationship? And if you're an ORM developer reading this article: please, the one-to-very-very-many relationship problem is real need, and you can solve it.

    You might also be using technology like JavaSpaces, which as far as I know does not support an ORM-like approach for lazy loading relationships. My implementation of SpaceCategoryDAOshows an approach that can help you make your domain model more intelligent. Again, I also wish the vendors within this space started will start lazy loading more.

    Hopefully, this article has made lazy loading seem less magical to you. Java's capabilities to create proxies to lazily load a rich domain model are surprisingly strong and easy to use.

    Happy hacking!

    Resources

      
    http://today.java.net/im/a.gif