This discussion is archived
13 Replies Latest reply: Jul 23, 2012 1:29 AM by jiri.machotka RSS

Migrating 2 million contents from Database to UCM.

user1175496 Newbie
Currently Being Moderated
Hi,
We want to migrate approximately 2 million content items from the Database to UCM system.We are planning to have a POC done using the following approaches:
Approach 1: Using the Checkin_Universal service:Custom Java code will be reading in the files from the existing database system and Using the RIDC API,we will checkin the same in the UCM system-Time consuming process
Approach 2:Bypassing the Checkin API:We have identified the set of tables and indexes getting updated while checking in the content item in UCM,so we want to bypass the API call and directly insert the data in the respective table-High risk of content loss

I have following questions:
1. What is the strategy that the experts out here would suggest and propose for the migration?
2. Has any one tried their hands on bypassing the service call and using the direct DB insert.

Any pointers and help is highly appreciated,Please respond asap as this is bit urgent and critical.
  • 1. Re: Migrating 2 million contents from Database to UCM.
    jiri.machotka Guru
    Currently Being Moderated
    I'd strongly discourage you from direct DB insert. 2 million items (what size? will they be full-text indexed/converted?) is not that much to risk that you will forget something and have inconsistent data.

    I think you should start from your limiting conditions - namely, in what timeframe do you need to migrate your data. We have a project (on 10g) where similar amount of content items needs to be regularly checked in on monthly basis within 1 business day. Our files are relatively small (PDF, tens of kB).

    You might also turn on "fast checkin" - see this answer how to do it and what it can do for you: Re: Difference between UCM and IPM
  • 2. Re: Migrating 2 million contents from Database to UCM.
    user1175496 Newbie
    Currently Being Moderated
    jiri.machotka wrote:
    I'd strongly discourage you from direct DB insert. 2 million items (what size? will they be full-text indexed/converted?) is not that much to risk that you will forget something and have inconsistent data.

    I think you should start from your limiting conditions - namely, in what timeframe do you need to migrate your data. We have a project (on 10g) where similar amount of content items needs to be regularly checked in on monthly basis within 1 business day. Our files are relatively small (PDF, tens of kB).

    You might also turn on "fast checkin" - see this answer how to do it and what it can do for you: Re: Difference between UCM and IPM
    Thanks for your reply Jiri.
    I need to migrate the documents in 2 days and average size of the document is nearly 16 MB,the file format is PDF only.I'm not sure for such sized documents how much time is it going to take for migrating the complete set.
    Moreover we tried our hands on Direct DB insert ,but the content does not appear in the UCM search ,even after rebuilding the Indexes.We disabled the indexing before inserting the document ,but after rebuilding the indx,i'm able to see the document in the tables ,but i 'm not able to search it through console.
    Can you throw some more light on the time frame for such kind of migration...
    TIA
  • 3. Re: Migrating 2 million contents from Database to UCM.
    jiri.machotka Guru
    Currently Being Moderated
    Can you throw some more light on the time frame for such kind of migration
    This is something that you need to estimate - it may differ per HW, etc.

    Before writing any program, I'd use the standard Batch Loader, use the Fast Upload mentioned earlier and take some 1000 files - it will give you rough idea. Note you have not mentioned if there is any time frame you need to meet - if you have a month, or even a week, you are probably good to go anyway. If you have a weekend, I guess it will be close. If you have one day, you will need a customization.

    From our experience, custom loader can improve speed (compare to Batch Loader) by 2-5 times
  • 4. Re: Migrating 2 million contents from Database to UCM.
    user1175496 Newbie
    Currently Being Moderated
    jiri.machotka wrote:
    Can you throw some more light on the time frame for such kind of migration
    This is something that you need to estimate - it may differ per HW, etc.

    Before writing any program, I'd use the standard Batch Loader, use the Fast Upload mentioned earlier and take some 1000 files - it will give you rough idea. Note you have not mentioned if there is any time frame you need to meet - if you have a month, or even a week, you are probably good to go anyway. If you have a weekend, I guess it will be close. If you have one day, you will need a customization.

    From our experience, custom loader can improve speed (compare to Batch Loader) by 2-5 times
    Hi,
    I tried using the OOTB Fast checking component ,but its taking 4 seconds for a 124 KB file which is quite higher for me to migrate 2 million content in just 2 days.Is ther eany other for this.
  • 5. Re: Migrating 2 million contents from Database to UCM.
    jiri.machotka Guru
    Currently Being Moderated
    Speaking about the component, are at 10g? (In 11g it is more a parameter)

    Can you also describe how you achieved the number? Is it that you checked in one file and measured when it is available (indexed)? Or, have you tried BatchLoader, and it is an average (out of how may files?)

    4 sec for one 124KB file is a nonsense - even my slow laptop running UCM in a VM would beat you on that, so there must be a flaw somewhere. If that is really true, someone should see your environment and optimize it first.
  • 6. Re: Migrating 2 million contents from Database to UCM.
    user1175496 Newbie
    Currently Being Moderated
    jiri.machotka wrote:
    Speaking about the component, are at 10g? (In 11g it is more a parameter)

    Can you also describe how you achieved the number? Is it that you checked in one file and measured when it is available (indexed)? Or, have you tried BatchLoader, and it is an average (out of how may files?)

    4 sec for one 124KB file is a nonsense - even my slow laptop running UCM in a VM would beat you on that, so there must be a flaw somewhere. If that is really true, someone should see your environment and optimize it first.
    I checked in the file using the custom java code and recorded the time for check in using the same. I have configured the fast checking so ,indexing is skipped.I absolutely have no clue why is it taking so much time.Can you provide me some pointers on optimizing the same.
    TIA.
  • 7. Re: Migrating 2 million contents from Database to UCM.
    jiri.machotka Guru
    Currently Being Moderated
    I did some investigation.

    First of I still do not absolutely understand what you did and what you measured. Nevertheless, (I guess) you should not measure check in of a single file. You need some statistical sample to get rid of noise caused by one-off tasks such as connect. Another question is what you measured (as you wrote your own custom code).

    I adjusted my RIDC sample program to read files from one directory. The main method looks as follows:
        public static Results CheckIn(File f) {
            // create the binder
            DataBinder checkinDoc = idcClient.createBinder();
            // populate the binder with the parameters
            checkinDoc.putLocal("IdcService", "CHECKIN_UNIVERSAL");
            checkinDoc.putLocal("dDocTitle",
                                "Document checked in through RIDC at " +
                                new Date());
            checkinDoc.putLocal("dDocType", "Document");
            checkinDoc.putLocal("dDocAccount", "");
            checkinDoc.putLocal("dSecurityGroup", "Public");
            // add a file
            // execute the request
            try {
                checkinDoc.addFile("primaryFile", f);
            } catch (IOException e) {
                myExecutable.logEvent("File" + f.getName() + " not found.");
                return null;
            }
            ServiceResponse checkinResponse;
            try {
                checkinResponse = idcClient.sendRequest(userContext, checkinDoc);
                myExecutable.logEvent("Check-in successful. Size:" + f.length() + " bytes");
            } catch (IdcClientException e) {
                myExecutable.logEvent("Check-in failed.");
                e.printStackTrace();
                return null;
            }
            DataBinder checkinData;
            try {
                checkinData = checkinResponse.getResponseAsBinder();
                Results res =
                    new Results(checkinData.getLocal("dID"), checkinData.getLocal("dDocName"));
                myExecutable.logEvent("Successfully got response - dID is " +
                                      res.getDID() + ", dDocName is " +
                                      res.getDDocName());
                return res;
            } catch (IdcClientException e) {
                myExecutable.logEvent("Unable to get response.");
                e.printStackTrace();
            }
            return null;
        }
    Note that last try { } catch { } is not necessary - the program just waits to get dDocName and dID for reporting purposes (and I used it with other services originally).

    This program allows me to measure time necessary for the whole batch as well as for individual files. As I had really various files in my directory (from several bytes .txt to several MB PowerPoints) I could see how measured times varied (from few ms per file to few seconds). I also did several runs - not sure if files could be cached somewhere, but from 3 min for approx 150 files in the first run, I got to 1:20 min in the last. Note that I ran UCM in a VM, and the program from its host - the delay could also be caused by disk. Nevertheless, even in my slowest run I had approx 1 file per second, and only several MB files took more that 2 secs.

    If you want, drop me an email and I can send you my program as well as the resulting protocol.
  • 8. Re: Migrating 2 million contents from Database to UCM.
    user1175496 Newbie
    Currently Being Moderated
    Hi,
    I written the following code which calculates the time taken by each checkin and filesize:
    {  long   start = System.currentTimeMillis();
    System.out.println("Starting time >>>>>>>>> " + start);
    oracle.sql.BLOB blob = (BLOB)resultSet.getBlob("PAGE");
    int intLength = (int)blob.length();
    InputStream is = blob.getBinaryStream();
    java.sql.Date date = resultSet.getDate(4);

    // get the binder
    DataBinder binder = idcClient.createBinder();
    binder.putLocal ("IdcService", "CHECKIN_NEW");
    binder.putLocal ("dDocAuthor", USER);
    binder.putLocal ("dDocTitle", "pdf check in test");
    // binder.putLocal ("dDocName", "test-checkin-6");
    binder.putLocal ("dDocType", "Document");
    binder.putLocal ("dSecurityGroup", "Public");
    binder.putLocal ("dDocAccount:", "");
    binder.putLocal ("xComments", "Document For DataConversion ");
    binder.addFile ("primaryFile", new TransferFile(is,"pdf check in test",intLength,"text/html"));

    // check in the file
    ServiceResponse response = idcClient.sendRequest (userContext, binder);

    // get the response as a string
    String responseString = response.getResponseAsString();

    System.out.println("Response>>>>>>> " + responseString);
    long now = System.currentTimeMillis();
    System.out.println("Finish time >>>>>>>>> " + now);
    double gap = (now - start);
    System.out.println("Database Performance output >>>>>>>>> " + gap);

    This gives me size of each file and the time taken by the check in process...
    So this gives me file size 124 kb takes 4 sec and file size 19 MB takes 4.5 minutes...
    I did this analysis for approximately 500 documents and it took almost 3 hrs for various sizes starting 60 kb -25 MB.
    Lemme know if there is something wrong in this code..
  • 9. Re: Migrating 2 million contents from Database to UCM.
    jiri.machotka Guru
    Currently Being Moderated
    No, the code is OK.

    If you tried a batch of documents, then it really seems to be the performance you get. That's bad. I can't imagine a customer that would accept such performance. You may have to persuade them to upgrade hardware, or optimize the infrastructure. This will require someone experienced on site. I'm afraid such kind of issues are beyond this forum.
  • 10. Re: Migrating 2 million contents from Database to UCM.
    user1175496 Newbie
    Currently Being Moderated
    jiri.machotka wrote:
    No, the code is OK.

    If you tried a batch of documents, then it really seems to be the performance you get. That's bad. I can't imagine a customer that would accept such performance. You may have to persuade them to upgrade hardware, or optimize the infrastructure. This will require someone experienced on site. I'm afraid such kind of issues are beyond this forum.
    This was done on Dev box, which has 2 VM CPUs and 16 MB ram.Even i'm not sure why the check in is taking so much time.we are simply doing a POC to arrive at a rough estimate of the time being taken.
    Well!! I will give it one more try will all other processes shut down.
  • 11. Re: Migrating 2 million contents from Database to UCM.
    jiri.machotka Guru
    Currently Being Moderated
    My guess is that it will be either disk, or networking what causes slow-downs.

    If you have test environment, which should be closer to PROD I'd suggest to try also that one.

    P.S. what is the purpose of performing a load test on an environment, which is obviously far from your real PROD environment?
  • 12. Re: Migrating 2 million contents from Database to UCM.
    user1175496 Newbie
    Currently Being Moderated
    jiri.machotka wrote:
    My guess is that it will be either disk, or networking what causes slow-downs.

    If you have test environment, which should be closer to PROD I'd suggest to try also that one.

    P.S. what is the purpose of performing a load test on an environment, which is obviously far from your real PROD environment?
    Hey Jiri,
    It was the networking issue which was causing it to slow down..I would like to try the batch loader as well,can you please throw some light on using the batch loader for the approach.
    Whether we need to write all the files to the disk first and somehow attach the metadata to the file and then use the batch loader to checking the file.Will it have any major effect on the performance???
    TIA
  • 13. Re: Migrating 2 million contents from Database to UCM.
    jiri.machotka Guru
    Currently Being Moderated
    As for Batch Loader, see here http://docs.oracle.com/cd/E23943_01/doc.1111/e10792/c03_processes.htm#CHDDFCHE or here http://docs.oracle.com/cd/E23943_01/doc.1111/e10792/e01_interface.htm#CSMSP360

    Yes, you got the basic idea how the Batch Loader works, and yes, do expect effects on performance. Ideally, you'd have a testing environment where you do tests like these. If you don't have one, you could also try do tests in quiet hours (like weekends, or overnight, if that are quiet hours for you). Thinks twice before you start loading data to a PROD system - namely, will you want to revert your actions when finished? If so, how?

    Yet, it is better start small, then find out you blocked your systems for days without any chance to do anything.

Incoming Links

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points