The JCC LogMiner Loader and TSN’s
This article contains notes on two important issues regarding Rdb TSN’s. While most customers will not experience the issues described, some high-performance databases are likely to experience the issues discussed. The article is a little long because of the context that needs to be set.
The entire point to using the JCC LogMiner Loader is to extract changes made to selected tables in the source database and apply them appropriately to a target. Within the concept of an ACID database, therefore, one expects to obtain all the rows changed in every transaction, as specified in the options file you provide. For this reason the LogMiner must always start at a point in the AIJ where complete transactions will be guaranteed. This is the definition of a journal quiet point. An alternative definition is a location in the journal where all transactions that have written to the journal have also written their commit records or rolled back.
A quiet point can be forced by the DBA using the RMU/BACKUP/AFTER/QUIET DCL command. The active journals that remain in the database at that point start at a quiet point.
There is another case though. Because of the way Rdb writes changes, flushing them at the end of transactions (for smaller transactions), there are many instances when quiet points occur in the middle of the journal. These are called micro-quiet points. There is no way to detect them except through analysis of the journal a record at a time as is done by the LogMiner.
For each transaction, the LogMiner reports something called the AERCP (AIJ Extract Restart Control Point) which contains the “address” of the last micro quiet point that the LogMiner encountered. This structure appears in the commit record that the LogMiner produces and contains the following fields:
Field | Length | Content |
Version Number | 1 | Structure Version number, currently 1 |
Structure Length | 1 | Length of Structure (currently 28) |
MBZ | 2 | Must be zero |
MQP_VNO | 4 | Micro Quiet Point VNO |
MQP_VBN | 4 | Micro Quet Point VBN |
MQP_TSN | 8 | Micro Quiet Point TSN |
LCP_TSN | 8 | Last extracted TSN |
The important fields in this are the VNO (journal sequence number) and the MicroQuietPoint TSN. The MQP_TSN is where the LogMiner beings sending data upon restart. The block offset into the journal (VBN) is a hint as to where this point is for journals in the live database (but not for backup journals because of space compression.)
When presented with the AERCP, the LogMiner can start in the middle of a live journal and continue sending records to the Loader. It will locate the records for the last TSN sent and then send them and all subsequent transactions.
It is entirely possible that on restart the LogMiner will end up resending transactions that the Loader has already processed. This can happen if, for instance, there are very long running transactions. For this reason, as a part of the Loader’s restart context, The Loader will store not only the AERCP of its transaction but also the TSN of the transaction that it commits. This TSN is another field in the commit record that the Loader processes. In this way, when a Loader session is restarted it can utilize the AERCP to direct the LogMiner where to start in the journals or backup journals and it can use the TSN provided in the commit record to skip processing records until it reaches the point where it left off.
Until Rdb version 7.0, with the advent of very fast computers with lots of cores/CPUs and high performance tools like commit to journal optimization which can allocate up to 1024 TSNs at a single request, TSNs can tick over very quickly. So quickly, in fact, that some databases could run into the limit of the traditional 32-bit TSN value range within reasonable timeframes. Fixing that would involve completely rebuliding the database, a formidable problem for large databases. (The first production deployment of the Loader was a rebuild of a very large mission critical database with less than 5 minutes of down-time.)
The TSN field in V7.0 and later was engineered to be a three-field structure. The traditional portion (pre Rdb 7.0) is an unsigned 32-bit integer and is broken into two parts. Bit 31 of that field is used as an indicator and bits 0-30 are used as the traditional TSN. This means that the traditional portion can represent values up to 2,147,483,647. Roughly think of this as two thousand million, a very large value. The second field is a two byte structure that is stored in the page tail and is the same for all lines on the page.
Bit 31 of the traditional TSN field in the TSN index is used to indicate what to do with the prefix of the TSN in the page tail. This prefix is called the base TSN for the page and is 16 bits long. When the indicator bit is zero, the TSN of a line is measured relative to the base previous to the current base. When bit 31 is set the TSN stored with the line then bits of the TSN will be appended to the current page base TSN. All of this is a long way to say that there are really 48 (or 47 not counting the indicator) bits which actually specify the TSN of a line. The resulting maximum TSN is 140,737,488,355,327 or 140 million million values, a truely enormous number.
We have become aware of a bug in the Rdb LogMiner version V7.3-210 whereby the LogMiner somehow produces signed arithmetic when the TSN exceeds the 31-bit length. We have not investigated this further but believe this probably extends backward in time to Rdb 7.2.5 which became available in the spring of 2011. It may indeed be older.
That means that TSNs that have a value larger than 2,147,483,647 (bit 31 is set) are returned to the LogMiner as negative values. (The TSN field in the commit record is distinct from the AERCP field. The AERCP field is not turned into a negative number.)
If this condition occurs while you are running a LogMiner Loader session you will not be exposed to any issues of loss of data. You can stop and restart the Loader as you normally would. This is because the Loader control process passes the (correct) AERCP to the LogMiner to restart and then itself skips incoming data until it receives the (negative) TSN from the LogMiner. At that point it will continue its work.
You can determine whether you are exposed to this problem by using the JCC_LML_DUMP_CHECKPOINT tool to examine whether it displays negative TSNs as in the following. The result looks like the following (taken from one of our regression cycles):
$ jcc_lml_dump_checkpoint REGTESTRDBSKIP loader_regression_test_db rdb
JCC LML Dump Checkpoint V03.05.01 (built 6-FEB-2018 16:00:57.71)
-- Checkpoint restart information --
--- Parallel mode ---
Write Timestamp: 1-MAR-2018 10:35:09.31
LoaderName: REGTESTRDBSKIP
Completion Flag: N
Checkpoint Interval: 1
Input Data Source: LML_CONT_REGTESTRDBSKIP
Last Transaction:
Start Time: 1-MAR-2018 10:21:32.23
Commit Time: 1-MAR-2018 10:21:32.23
TSN: 751219399
LSN: 27257
AERCP: 1-28-27223-72623-751219399-751219399
RM TID:
This shows that this database is about one-third of the way towards completely using the available 31-bit TSN pool.
It is absolutely imperative that you follow the following protocol before upgrading Rdb, even when applying a patch. It is also absolutely imperative that you follow this protocol before resetting the TSNs in the database.
- Before upgrading Rdb or resetting TSNs in the database root, completely exhaust all data in the journals.
- When the database upgrade is complete or the TSNs are reset, explicitly restart the Loader LogMining in the active journals, not the backup journals. Don't forget to make sure that your continuing activations of the Loader start in the backup journals.