Category: processes

About Checkpoints and Commits

 What is Checkpoint (CKPT) process :

  • At specific times, all modified database buffers in the SGA are written to the datafiles by DBWn. This event is called a checkpoint. The checkpoint process is responsible for signaling DBWn at checkpoints and updating all the datafiles and control files of the database to indicate the most recent checkpoint.
  • When a checkpoint occurs, Oracle must update the headers of all datafiles to record the details of the checkpoint. This is done by the CKPT process. The CKPT process does not write blocks to disk; DBWn always performs that work.
  • Control files also record information about checkpoints. Every three seconds, the checkpoint process (CKPT) records information in the control file about the checkpoint position in the online redo log. This information is used during database recovery to tell Oracle that all redo entries recorded before this point in the online redo log group are not necessary for database recovery; they were already written to the datafiles.

Further discussions about this can be found at AskTom website also (Questions regarding checkpoint, version 8.1.6 – AskTom)


A common problem for people with very active systems who use filesystems to store their datafiles is the error, “Thread cannot allocate new log, sequence ; Checkpoint not complete” The most commonly recommended remedies for this situation are either to use larger or more online redologs.
Unfortunately, if ‘Checkpoint not complete’ is a chronic problem, neither of these solutions will eliminate the problem.  They may forestall, or even reduce the frequency of the error, but the problem will not be solved.
‘Checkpoint not complete’ is a result of an instance filling and switching through all available online redologs before one checkpoint can complete.  Because Oracle must always be able to recover from instance failure from the last checkpoint forward, the rules of recovery prevent an instance from reusing any online redolog that contains changes newer than the last checkpoint.
A checkpoint is the writing of all dirty blocks in the buffer cache to the datafiles as of a particular SCN.  In general, if a checkpoint cannot complete in a timely fashion, it is a result of slow I/O to the datafiles.  The possible solutions to this problem seek to eliminate the I/O bottleneck, or compensate for a slow I/O subsystem.

The problem with the recommendations to increase the size or number of redologs is that if the rate of DML activity is so high that checkpoints cannot keep up, there is no reason to think that by increasing the amount of online redologs, that it will make the I/O subsystem any more able to keep up.  That is to say, it will take a longer time to fill up all the online logs, but the basic problem of not being able to write out dirty blocks as fast as the database is changing will still be there.
The first step in diagnosing ‘checkpoint not complete’ is to determine if the problem is chronic or not.  If the error appears in the alert log many times a day, or consistently during peak hours, then the problem is chronic.  If the error appears at the same time every day or every week, or if the problem is only occasional, it is not chronic.

Non-chronic ‘checkpoint not complete’ probably doesn’t require any re-engineering of the systems architecture.  It is most likely the result of a single application suddenly making a large amount of DML (inserts, updates, deletes) to the database in a short time. The best way to solve this problem is to find out if the application can reduce its generation of redo by performing its changes ‘nologging.’  Any bulk inserts can be done using append mode unrecoverable, and generate no significant redo.  Deletes that clear a whole table or a whole class of records can be converted to truncates of the table or of a partition.  It very least, the application can be modified to throttle the rate of change back to a rate that the I/O subsystem can keep up with.  Even the crude solution of increasing the number or size of redologs may solve sporadic, non-chronic occurrences of  ‘checkpoint not complete.’
Chronic ‘checkpoint not complete’ is a more complicated problem.  It means that overall, the rate of DML of the instance is higher than the I/O subsystem can support.  In systems with chronically slow I/O, application performance will be degraded, because the buffer cache is not purged of dirty blocks fast enough or frequently enough.  Such systems show relatively long time_waited for the “buffer busy wait” and “write complete wait” events in v$system_event. The solution to such a problem is either to compensate for the problem by making the checkpoint more aggressive, or to solve the problem by making the I/O more efficient.

To understand the solution to this problem, it is first necessary to understand something about how checkpoints work.  When a periodic checkpoint is being performed, a certain portion of the database writer’s capacity, or “write batch,” is made available for the checkpoint to use.  If the checkpoint can’t complete in time, it is valid to infer that Oracle is not using enough of the database writer’s write batch for the checkpoint, and that it should probably use a larger portion.  Note that none of this has anything to do with the CKPT background process.  Checkpoints are performed by the database writer.  CKPT just relieves the log writer from updating file header SCNs when checkpoints complete.

In Oracle8, a new feature, sometimes called “incremental checkpoint” or “fast start instance recovery” was introduced.  This feature is enabled with the initialization parameter FAST_START_MTTR_TARGET in 9i (FAST_START_IO_TARGET in 8i), and completely changes the behavior of Oracle checkpointing.  Instead of performing large checkpoints at periodic intervals, the database writer tries to keep the number of dirty blocks in the buffer cache low enough to guarantee rapid recovery in the event of a crash.  It frequently updates the file headers to reflect the fact that there are not dirty buffers older than a particular SCN.  If the number of dirty blocks starts to grow too large, a greater portion of the database writer’s write batch will be given over to writing those blocks out.  Using FAST_START_MTTR_TARGET is one way to avoid ‘checkpoint not complete’ while living with a chronically slow I/O subsystem.
In Oracle7, although there is no incremental checkpoint feature, there is an “undocumented” initialization parameter that can be set to devote a larger portion of the write batch to checkpoints when they are in progress.  The parameter is _DB_BLOCK_CHECKPOINT_BATCH, and to set it you need to find out the size in blocks of the write batch and the current checkpoint batch.  This can be obtained from the internal memory structure x$kvii.
Another way to compensate for slow I/O is to increase the number of database writers.  By dedicating more processes to writing the blocks out, it may be possible to allow checkpoints to keep up with the rate of DML activity on the database.  Bear in mind that certain filesystems, such as AdvFS on Compaq Tru64 Unix, obtain no benefit, from multiple database writers.  Such filesystems exclusively lock a file for write when any block is written to that file.  This causes multiple database writers to queue up behind each other waiting to write blocks to a particular file.  Oracle has provided notes on Metalink regarding such filesystems.
If you are more inclined to address the root cause of the problem than to compensate for it, then there are a few measures you can take.  Oracle supports asynchronous I/O on most platforms, if datafiles are stored in raw or logical volumes.  Conversion to raw or LVs requires significant engineering, but is much easier than totally replacing the storage hardware.  Using asynchronous I/O also relieves the aforementioned file-locking bottleneck on certain types of filesystems.
I/O hardware upgrade or replacement is the most complex approach to solving the problem of slow I/O.  Using RAID disk arrays allows data to be “striped” across many disks, allowing a high rate of write-out. Caching disk controllers add a battery-protected cache for fast write-out of data.
reference : Misconceptions on Checkpoints

What happens when a transaction commits  :

When a transaction is committed, the following occurs:
1. The internal transaction table for the associated rollback segment records that the transaction has committed, and the corresponding unique system change number (SCN) of the transaction is assigned and recorded in the table.
2. The log writer process (LGWR) writes redo log entries in the SGA’s redo log buffers to the online redo log file. It also writes the transaction’s SCN to the online redo log file. This atomic event constitutes the commit of the transaction.
3. Oracle releases locks held on rows and tables.
4. Oracle marks the transaction complete.

Other references :
Oracle 9.2 Concepts Documentation


Reading the AWR Report

This section contains detailed guidance for evaluating each section of an AWR report.  An AWR report is very similar to the STATSPACK report from Oracle9i, and it contains vital elapsed-time information on what happened during particular snapshot range.  The data in an AWR or STATSPACK report is the delta, or changes, between the accumulated metrics within each snapshot.

The main sections in an AWR report include:

Report Summary: This gives an overall summary of the instance during the snapshot period, and it contains important aggregate summary information.

Cache Sizes (end): This shows the size of each SGA region after AMM has changed them.  This information can be compared to the original init.ora parameters at the end of the AWR report.

Load Profile: This important section shows important rates expressed in units of per second and transactions per second.

Instance Efficiency Percentages: With a target of 100%, these are high-level ratios for activity in the SGA.

Shared Pool Statistics: This is a good summary of changes to the shared pool during the snapshot period.

Top 5 Timed Events: This is the most important section in the AWR report.  It shows the top wait events and can quickly show the overall database bottleneck.

Wait Events Statistics Section: This section shows a breakdown of the main wait events in the database including foreground and background database wait events as well as time model, operating system, service, and wait classes statistics.

Wait Events: This AWR report section provides more detailed wait event information for foreground user processes which includes Top 5 wait events and many other wait events that occurred during the snapshot interval.

Background Wait Events: This section is relevant to the background process wait events.

Time Model Statistics: Time mode statistics report how database-processing time is spent. This section contains detailed timing information on particular components participating in database processing.

Operating System Statistics: The stress on the Oracle server is important, and this section shows the main external resources including I/O, CPU, memory, and network usage.

Service Statistics: The service statistics section gives information about how particular services configured in the database are operating.

SQL Section: This section displays top SQL, ordered by important SQL execution metrics.

SQL Ordered by Elapsed Time: Includes SQL statements that took significant execution time during processing.

SQL Ordered by CPU Time: Includes SQL statements that consumed significant CPU time during its processing.

SQL Ordered by Gets: These SQLs performed a high number of logical reads while retrieving data.

SQL Ordered by Reads: These SQLs performed a high number of physical disk reads while retrieving data.

SQL Ordered by Parse Calls: These SQLs experienced a high number of reparsing operations.

SQL Ordered by Sharable Memory: Includes SQL statements cursors which consumed a large amount of SGA shared pool memory.

SQL Ordered by Version Count: These SQLs have a large number of versions in shared pool for some reason.

Instance Activity Stats: This section contains statistical information describing how the database operated during the snapshot period.

Instance Activity Stats (Absolute Values): This section contains statistics that have absolute values not derived from end and start snapshots.

Instance Activity Stats (Thread Activity): This report section reports a log switch activity statistic.

I/O Section: This section shows the all important I/O activity for the instance and shows I/O activity by tablespace, data file, and includes buffer pool statistics.

Tablespace IO Stats

File IO Stats

Buffer Pool Statistics

Advisory Section: This section show details of the advisories for the buffer, shared pool, PGA and Java pool.

Buffer Pool Advisory

PGA Aggr Summary: PGA Aggr Target Stats; PGA Aggr Target Histogram; and PGA Memory Advisory.

Shared Pool Advisory

Java Pool Advisory

Buffer Wait Statistics: This important section shows buffer cache waits statistics.

Enqueue Activity: This important section shows how enqueue operates in the database. Enqueues are special internal structures which provide concurrent access to various database resources.

Undo Segment Summary: This section gives a summary about how undo segments are used by the database.

Undo Segment Stats: This section shows detailed history information about undo segment activity.

Latch Activity: This section shows details about latch statistics. Latches are a lightweight serialization mechanism that is used to single-thread access to internal Oracle structures.

Latch Sleep Breakdown

Latch Miss Sources

Parent Latch Statistics

Child Latch Statistics

Segment Section: This report section provides details about hot segments using the following criteria:

Segments by Logical Reads: Includes top segments which experienced high number of logical reads.

Segments by Physical Reads: Includes top segments which experienced high number of disk physical reads.

Segments by Buffer Busy Waits: These segments have the largest number of buffer waits caused by their data blocks.

Segments by Row Lock Waits: Includes segments that had a large number of row locks on their data.

Segments by ITL Waits: Includes segments that had a large contention for Interested Transaction List (ITL). The contention for ITL can be reduced by increasing INITRANS storage parameter of the table.

Dictionary Cache Stats: This section exposes details about how the data dictionary cache is operating.

Library Cache Activity: Includes library cache statistics describing how shared library objects are managed by Oracle.

SGA Memory Summary: This section provides summary information about various SGA regions.

init.ora Parameters: This section shows the original init.ora parameters for the instance during the snapshot period.


ref :