4 Replies Latest reply on Apr 29, 2018 4:35 PM by Gasimov Hafiz

    After Daemon restart be data loss

    Gasimov Hafiz

      Hi everyone.

      I have a Timesten DB .

      so my table is abm_prod_bal , and i begin

       

      Command> update  fcbsabm.abm_prod_bal set bal_lvl=1234 where bal_key=-471;

      1 row updated.

       

      Command> select bal_lvl from fcbsabm.abm_prod_bal where bal_key=-471;

      < 1234.00000000000 >

      1 row found.

       

      so i get Daemon stop , start 

      now i see my data is below

      Command> select bal_lvl from fcbsabm.abm_prod_bal where bal_key=-471;

      < 0.00000000000 >

      1 row found.-----------------this data not update but my DB setting (Default setting AutoCommit=1)

       

      MY ODBC parameter as below:

      [TT_TENANT]

      Driver=/home/timesten/TimesTen/ttfcbs/lib/libtten.so

      DataStore=/home/u02/timesten/DataStore/TT_TENANT

      DatabaseCharacterSet=AL32UTF8

      PermSize=5000

      TempSize=64

      AutoCreate=1

      Logging=1

      DurableCommits=0

      LockLevel=0

      Temporary=0

      LogBufMB=32

      PrivateCommands=1

      CkptFrequency=0

      CkptLogVolume=0

      Isolation=1

      Connections=64

       

       

      why after Daemon restart not update data?

        • 1. Re: After Daemon restart be data loss
          ChrisJenkins-Oracle

          2 questions:

           

          1.    Was the database loaded in memory when you stopped the daemon?

           

          2.    Is this the same database where you removed the log files manually? And if so did you destroy and re-create the database before stopping/starting the daemon?

           

          You should understand that you should not stop the daemon if you have any active (i.e. loaded in memory) databases. Stopping the daemon is a rare event (there is little reason to ever stop it other than for a software update) and needs to be done in a co-ordinated fashion - you need to make sure all your databases are cleanly shutdown before you stop the daemon. Stopping the daemon with an active database is essentially performing a 'crash stop' of the database; a bit like powering off the machine without first shutting it down! Since you are using DurableCommits=0 this will likely result in data loss. Even worse, you have disabled checkpointing completely (CkptFrequency=0, CkptLogVolume=0), which is absolutely not a good idea as you now have little or no disk persistence at all.

           

          What  happened here is as follows:

           

          1.    You made the update, which changed the data in the in-memory table.

           

          2.   The update was automatically committed (because of Autocommit=0) so the lock on the row was released making the change visible to other sessions/transactions. Because you have DurableCommits=0 the log records representing the update and the commit are not immediately written to disk (they will get written at some point depending on several factors).

           

          3.   You stopped the daemon while the database was active; this simply kills the database instantly.

           

          4.   You restarted the daemon and reconnected to the database. The database was loaded into memory (ramPolicy = inUse by default) from the last checkpoint (which would have been its state at the last *clean* shutdown since you have disabled automatic checkpointing) and then recovered back to a consistent state using the log files. Of course, the update and the commit record are not in the logs so the update does not get applied during recovery. Your database is self-consistent but is missing the update.

           

          This is all expected behaviour based on your parameter settings and the actions you have taken. If you were to change the database setting to DurableCommits=1 then every commit operation will do a forced (synchronous) log buffer flush to disk. In that case your data would still be there after the above sequence of events (but of course all commit operations will be much, much slower).

           

          You need to understand the different configuration parameters, especially those that relate to persistence, and configure them in a way that meets your requirements in terms of both persistence and performance. TimesTen offers a lot of  options in this area compared to most other databases (including programatic options controllable by the application) in order to allow developers to balance data safety against performance. TimesTen also has high-availabilti capability using replication which can also be used as part of an overall architectural approach to data safety and availability.

           

          Chris

           

          • 2. Re: After Daemon restart be data loss
            Gasimov Hafiz

            i carefully read this answer.:

             

            1. i not understant this question: "  Was the database loaded in memory when you stopped the daemon?"   what is database loaded in memory?

            2. yes regret , same DB

             

            A) and not understant: "you need to make sure all your databases are cleanly shutdown before you stop the daemon"

            what difference " all your databases are cleanly shutdown" and "you stop the daemon"?

            ------------------------------------------------------

            B) and additional : what is best practice the below setting value?

            {Since you are using DurableCommits=0 this will likely result in data loss. Even worse, you have disabled checkpointing completely

            (CkptFrequency=0, CkptLogVolume=0), which is absolutely not a good idea as you now have little or no disk persistence at all.}

            ------------------------------------------------------

            C) which factors?

            the commit are not immediately written to disk (they will get written at some point depending on several factors).

            ------------------------------------------------------

            and finally question:

            if DurableCommits = 0 then how much time auto commmit? how many minute?

             

            thanks.

             

            • 3. Re: After Daemon restart be data loss
              ChrisJenkins-Oracle

              This is a long answer, sorry, but if you want to understand how TimesTen works then I recommend you to read it carefully. I would also recommend that you study the TimesTen documentation and also the QuickStart Tutorials...

               

              When you install TimesTen you create a TimesTen 'instance'. An instance is managed by a management process called the main daemon. When the main daemon is running the instance is 'up' and when the main daemon is not running the instance is 'down'. You can only perform useful actions in an instance that is up.

               

              Each instance can manage one or more independent databases. Each database consists of some files  on disk (checkpoint and log files) which comprise the persistent copy of the database. Since TimesTen is an in-memory database, the database has to be loaded into memory (from the on disk checkpoint and log files) in order to be accessed. This process is called 'RAM load' and by default it occurs automatically when some application or utility connects to the database.

               

              Answer A) At connection time if the database is 'down' (i.e. not loaded in memory) then the main daemon creates a shared memory segment for the database, allocates a dedicated sub-daemon process to manage the individual database and tells it to load the database into the shared segment for disk. Once that process has completed, the connection request completes and the application/utility can access the database. If the database is already up at connection time (e.g. due to other connections) then the connection can proceed immediately as there is no need to load the database into memory.

               

              When there are no longer any application/utility connections the database is automatically shutdown; a checkpoint is taken and the shared memory segment is related back to the OS. This process is called 'RAM unload'.

               

              This default behaviour is referred to as the 'inUse' RAM policy. Other policies can be set using the ttAdmin command, the most useful being 'manual'. With manual RAM policy the administrator must explicitly startup (RAM load) a database before it can be used and must explicitly shut it down (RAM unload) when required. The default inUse policy is okay for messing around but for serious usage the manual policy is strongly recommended.

               

              NOTE:   It is NOT possible to RAM unload (shutdown) a database when there are active application/utility connections. You must disconnect all applications/utilities first.

               

              Since the main daemon is the master controller for everything within the instance, if you stop the main daemon while one or more databases are loaded in memory then those database will be 'invalidated'; they will be forced out of memory without any checkpoint etc. and unless the databases are configured with HA (replication) or DurableCommits=1 you will very likely lose some data. If you ever need to stop the daemon, make sure all of you databases are shutdown cleanly first (you can check using the ttStatus command).

               

              Answer B)   The TimesTen durability mechanism relies on a combination of (fuzzy) checkpointing and transaction logging. Both are required!

               

              Checkpointing: Every TimesTen database has a pair of checkpoint files on disk. These files are essentially an image of the in-memory database as of a specific moment in time (that is an oversimplification but conceptually it is good enough). When a checkpoint operation occurs, the contents of the oldest checkpoint file on disk are synchronised with the current contents of the in-memory database. This is an optimised mechanism (it doesn't write out the entire database, only areas that have changed) which does not block queries or DML. The next checkpoint will write to the other file (which is now the oldest) and the files will be used in flip-flop fashion for subsequent checkpoints. Checkpoints normally occur automatically and are triggered either based on a 'time interval' (CKptFrequency - default is 600 seconds) or 'amount of data changes' (CkptLogVolume - default is 0 - not used to trigger checkpoints). One or other or even both should be configured; setting both to zero disables automatic checkpointing which is highly inadvisable.

               

              Why do we have checkpoints? When a database is started up after a clean shutdown or after some kind of failure it automatically goes through recovery. There are several steps involved in recovery:

               

              1.    Load the most recent complete checkpoint image from disk into memory. This is a physical process and the time taken depends on (a) the size of the checkpoint file and (b) the speed of the disk storage. If using SSD/flash storage TimesTen has an option to use multiple threads to read the checkpoint file to further speed up this step.

               

              2.    Roll forward (re-apply) all transactions from the transaction logs from the time of the checkpoint image used in step (1) until the last log. The rollback any uncommitted transactions. This is a logical process which uses a single thread.

               

              3.    Drop and rebuild any indexes that would have been modified during step (2) (index modifications are not logged for performance reasons). This step can utilise parallel processing based on the configured RecoveryThreads value.

               

              4.    Take a checkpoint.

               

              5.     Open the database for application connections.

               

              The time taken for step (1) is relatively constant for a given size of database and type of storage device.

              After a clean shutdown the time taken for steps (2), (3) and (4) is very low but after a crash

              the time taken for each step varies depending on many factors and can be substantial. The time taken for step (2) can be large, depending on how often the system is checkpointing and how much data is changing per second. The time for step (3) depends on how many indexes were modified during step (2), how large the underlying tables are and how many RecoveryThreads are configured.

               

              Tuning checkpointing is important for production systems but for development systems you can leave the settings at the defaults.

               

              If you disable checkpointing entirely several serious problems may arise:

               

              a)    Transaction log files will only be purged after a database shutdown/restart cycle. This can easily lead to you running out of disk space.

               

              b)     If you have any kind of unclean database shutdown, the time to recover the database could be extremely long since we will have to replay all of the work performed since the database was last started up. Recovery might takes hours or even days if a lot of work was done and the database has been up for a long time.

               

              Logging: Checkpoints provide regular points of physical recovery but they are not alone sufficient for data protection which is why TimesTen also has transaction logging. Whenever any persistent data is changed within the in-memory database, one or more log records are generated reflecting those changes. These records are staged to an in-memory log buffer (a multi-ring buffer supporting parallel concurrent append, read and flush) and then flushed to disk. Log records are posted to the log buffer as they are generated. For changes to table row data TimesTen also uses locking and versioning (depending on isolation level) to prevent changes made by one transaction from being visible to other sessions/queries/transactions until they are committed. When a commit is issued a 'commit marker' for the transaction is posted to the log buffer and then any locked rows/tables are unlocked, old versions are reclaimed etc. and the transaction's changes become visible to other sessions/queries/transactions. If DurableCommits=0 then at this point the commit call returns to the application with 'success' and the application moves on. If DurableCommits=1 then a synchronous log buffer flush is initiated; once that flush returns success the commit call returns to the application with 'success'.

               

              So, with DurableCommits=0 (the default), commits are basically 'in memory' commits only and while the transaction is committed in terms of the visibility of its changes to other transaction it is not immediately durable. It will of course become durable, usually fairly quickly, but how?

               

              In the managing sub-daemon there is a dedicated log flusher thread whose job is to flush pending log data from the log buffer to disk. The flusher adopts a dynamic algorithm to try to balance the delay in flushing data against excessive use of system resources. As a result there is no fixed time interval between when a log record or commit marker gets posted to the log buffer and when it gets written to disk and so becomes truly persistent.

               

              DurableCommits=0 is very good for performance and response time; an in-memory commit takes a few nanoseconds while even with the very best NVMe flash storage a commit to disk takes at least 10s to 100s of microseconds and with regular spinning HDDs may even take several milliseconds. The downside is that there is a delay between when a change is successfully committed in the database and that change becoming recoverable via the TimesTen persistence mechanism. If there is some kind of system or database crash some committed changes may be lost.

               

              So why use DurableCommits=0? As I said, performance! The difference in performance for DML (strictly speaking for commits) between DurableCommits-0 and DurableCommits=1 is several orders of magnitude. If you can accept some data loss on failure (which you can mitigate to some degree by using replication) then you can use DurableCommits=0. If you are not using replication and cannot tolerate any data loss then you should use DurableCommits=1 and accept the reduced performance for DML (it has no effect in reads of course).

               

              NOTES:

               

              1.   The DurableCommits attribute is a connection level attribute. Different connections can use different values. Also, TimesTen provides a built in procedure, ttDurableCommit, that an application can call to ensure that the next commit it issues will be a durable commit regardless of the setting of DurableCommits.

               

              2.   The log buffer is logically a single buffer shared by all connections to the database. Thus if a durable commit operation is issued by *any* connection, all transactions, from all connections, that committed prior to the durable commit will be made fully persistent by the synchronous buffer flush arising from the durable commit.

               

              By understanding the behaviour of TimesTen and using the capabilities that it provides it is possible for architects and application developers to design systems with any desired balance between performance and data safety across the entire spectrum.

               

              Answer C) See Answer B for details. Please do not confuse 'auto commit' with durability. 'Auto commit' is a driver setting that determines whether the ODBC/JDBC driver will automatically issue a commit after every SQL statement execution (using auto commit mode is generally a bad practice but that is a different discussion). It has nothing directly to do with durability, that is determined by the DurableCommits parameter as described above.

              1 person found this helpful
              • 4. Re: After Daemon restart be data loss
                Gasimov Hafiz

                Thanks Mr Chis. This answers very helpful.

                 

                thansk again.