Agile Legacies: Using Iterative Methods to Import Legacy Data Blog

Version 2


    Modern iterative development methodologies have proved their value in many software development projects. Nevertheless, despite the benefits of an iterative development cycle, this approach is often not applied to the process of importing legacy data. This article discusses how an iterative, agile approach can improve the process of importing legacy data and, in the process, reduce analysis errors and bugs, and increase product quality and user satisfaction.

    Nothing gives a development team more pleasure than to be able to create a new application completely from scratch--to let their creativity run wild, unhindered by ancient systems and archaic data structures. This is the stuff that developer dreams are made of!

    However, in the real world, software development projects are rarely done in a vacuum. Your company or client will inevitably already have an existing IT system in place. It may be a CICS mainframe application still maintained at great expense by some large IT corporation. It may be a fully fledged ERP solution, or an in-house client-server application built with whatever technology happened to be the flavor of the month five or ten years ago. It may just be an Access database application designed by a canny user. But it will exist.

    Indeed, importing legacy data is a crucial part of most software projects. It is also a task that rarely evokes a great deal of enthusiasm among developers. And yet it is of vital importance for the end user. These old databases often contain years of valuable business records that the user needs to access from the new application.

    So Just What Is Iterative Development, Anyway?

    Iterative development is a cornerstone of many modern software development methodologies. In iterative development, software is designed, built, and tested incrementally, in a series of "iterations." Each iteration aims to implement a working version of the application with a subset of the requirements. Iterative development has an important place in use-case-driven, architecture-centric approaches such as the Rational Unified Process (RUP), and in the more lightweight methodologies of the Agile family such as XP (Extreme Programming), Feature-Driven Development, SCRUM, DDSM, and others.

    Let's take a look at some of the essential activities common to most iterative development methodologies, and where legacy imports come into play.


    In iterative methodologies, planning is done at two levels. The first level is the global project plan, which estimates the overall project size and structure, and breaks the project up into iterations and milestones. Each iteration is typically "timeboxed," with the number of functionalities implemented being less important than meeting the deadline.

    The second level of planning is done at the iteration level. An iteration typically lasts from two to five weeks, depending on the methodology used and the project size and complexity, and produces a working version of the application that correctly implements a small number of extra functionalities.

    Often, during the global planning phase, importing legacy data is relegated to the end of the project, shortly before going into production. Some DBAs tend to be traditionalists, and (understandably) like to work with a mature, stable target database structure into which they will try to import the legacy data. So they may prefer to wait until the new domain model and database schema are stable before attempting to import legacy data.

    However, there are some very good reasons to consider a more iterative approach.

    • During analysis and design, studying and importing the legacy data often reveals useful business details that you got wrong or may have overlooked, or that the client forgot to mention: details that might otherwise show up a week before going into production. Remember, in many cases the people who wrote the legacy application will have spent years working on it, and they may well know your client and their specific business needs much better than you do. Underestimate this at your own risk!
    • Imported legacy data lets you test your new database model against real user data. Again, it can bring up database issues that wouldn't be found until much later.
    • Imported legacy data can give you a perfect test database for first-level performance testing against a realistic volume of data.

    So, the global project plan should allow time and resources for legacy data analysis and import activity in each iteration. Doing this incrementally will raise legacy-data-related issues earlier, so that they can be resolved faster and more easily.

    Design and Coding

    Some Agile methodologies, such as XP, tend to minimize the need to build formal design models, and rely on very lightweight, code-based design techniques. Others, such as FDD, use an overall domain model to guide and coordinate development efforts. Model-based iterative methodologies such as RUP are, on the other hand, very much model- and architecture-driven.

    In my experience, when legacy data is involved, a formal domain model is essential. It can (and should) be built incrementally, but it should exist.

    Understanding the legacy database model is often of vital importance when designing this new domain model. And understanding the behavior of the legacy application can reveal important details and business rules that the client may have trouble expressing explicitly.

    During the iteration design phase, the list of functionalities to be implemented in the iteration is finalized. The DBA leads the analysis of the legacy database, and works with the development team in building a domain model for the new application for the targeted features. The business analyst and customer should also play an active role, as an in-depth knowledge of the legacy application is often needed to understand the legacy data structure. Once the legacy domain model is understood, and the new domain model established, the DBA should work on importing the corresponding legacy data for use by the rest of the development team for this iteration.

    The advantages of this approach are twofold:

    • Understanding the corresponding parts of the legacy domain model and application lets you verify the new domain model and iteration design.
    • The imported legacy data lets you properly test the new application features with production-like data.

    This approach gives an important role to the DBA involved in the legacy data import process. There are a few things to consider in this regard:

    • Legacy data import issues should be made to optimize, not to penalize, the new design. Don't compromise good design just to make it easier to import legacy data. For example, you should not implement date values as Strings in your new database just because you are importing data from a text file.
    • On the other hand, don't neglect legacy data issues in the new design. For example, if the DBA can't find where a legacy field should go in the new design, maybe something was overlooked somewhere. Apparently unused legacy fields sometimes turn out to be necessary later on (in other screens or in reports, for example). Also, if the new structure is radically different from the legacy structure for identical requirements, you should at least review why the design is done this way in the old (and new) model. Here are some issues you may have to consider:
      • Is the legacy data structure better adapted to some of the screens or reports in the legacy application? Do those screens or reports exist in the new application?

      • Is the legacy data structure subject to technical constraints or ill-advised design decisions that don't apply for the new application?

    • Developers will have to be more flexible concerning the database model, which will evolve and change throughout the life of the project.

    Nevertheless, in our experience, integrating legacy data for each iteration can help to avoid some nasty surprises at the end of the project, when the new application has to deal with real client data.


    Testing of legacy data imports is of course just as important as any other application testing.

    Testing should be done against an agreed set of imported legacy data. This helps to ensure that no regressions occur during the legacy import procedure, and that the code is suitable for use with production-type data. Work with the client to agree on an official snapshot of legacy data to be used for tests. Though this often requires some effort to coordinate and to obtain from the client, it is essential for reliable testing, especially as testers will often use the legacy application alongside the new one to verify data imports and application behavior.

    When possible, it is useful to give team members training and access to a version of the legacy application running against this set of test data. This may take the form of an emulator (for mainframe applications) or the client installation of a client-server solution. This lets team members check their understanding of business logic, business rules, and calculations against the "real thing."

    Having legacy data available early can have some impact on developer work habits, and requires some organization. Here is one possible way of doing things:

    • Each developer has his own copy (or instance) of the test database for unit testing.
    • The DBA provides a script and an installation procedure for setting up or reinstalling a database.
    • When the DBA imports new legacy data, he provides an initialization script and/or database dump to developers so that they can update their local development database instance. Alternatively, he may prefer to update the databases himself.

    It is of course important to maintain strictly identical database structures and test data on the development, integration, and testing servers. Database structure updates should be subject to change control and preferably done by only one person.


    Importing legacy data is an important part of most IT projects. It should not be relegated to the end of the project, just before the application goes into production. Instead, the process of analyzing and importing legacy data should be fully integrated into the project iteration cycles. Integrating legacy data early and often will improve the development process, product quality, and customer satisfaction.