Loader Performance #1 - Network Latency
We occasionally receive reports that "The Loader is slow". All such reports require investigation. We have been able to identify the cause of these performance issues in all cases.
The first report we ever received came from a customer who had two data centers one in Saint Louis and the other in the Los Angeles area. The report said something like the following:
The Loader runs fine during the week, but on the weekend it gets very slow.
Well, that was a challenge. What was going on over the weekend that caused the slowness? It turned out that during the week the Loader was replicating normal TP operations. Transactions were small having a few or a few dozen rows. Over the weekend big batch jobs would fire off and transaction sizes would grow to a million rows or more.
In this case the replication was to Oracle as a target, but the issues here are generic to all targets. The Loader operates transactionally sending all rows in a transaction before doing a commit in the target. The interface defines how this happens. In the case of Oracle the native interface is OCI and that operates synchronously a row at a time.
In the OCI case, the Loader makes an OCI call to prepare a generated SQL procedure that actually does the investigation in the target as to whether this is an update or insert or delete. It then uses this prepared statement over and over to execute on all rows for a table. Calls are made to OCI sending the columns in the row plus indicator variables that carry NULL characteristics to the prepared procedure. The cycle is: transmission, execution in the target and then receipt of acknowledgement. It turns out that network latency becomes important here.
A proxy for the latency would be to determine what the round trip ping time is between nodes. In the case of this customer this was about 50 milliseconds.
If the Loader has 20 rows to send in a transaction, the 50 millisecond wait for each row adds up to 1 second. However, because the Loader is scalable being able to deploy multiple Loader threads (processes), if 16 threads are active concurrently the throughput is 16 transactions per second which is pretty fast even for a big TP system as this represents only the update portion of the database work.
However, if a single transaction encompasses a million rows then the parallelism inherent in the Loader does not seriously mitigate the throughput issue. A single transaction will take 50,000 seconds to transmit which is almost 14 hours. Of course if one were replicating a large number of such transactions concurrently then they could be done in parallel and you could get an average throughput of slightly better than one million-row transaction per hour.
This is an inherent limitation of the speed of light and the interface model which is synchronous. This is a good lesson that network bandwidth is a different measurement of performance than network latency. Latency can be critical.
There are alternatives however. For instance JDBC has a concept of "batches" in which rows are packaged into a single packet called a batch and the entire packet sent. In this case the throughput is increased by a factor proportional to the number of rows in each batch. Be aware that you are dealing with a JAVA engine here and that large batch sizes require proportionally larger JAVA memory allocations.
Batch size is configurable so you can almost use a dial to speed up the transfer. Increase the batch size and increase throughput by a similar factor.
Not all JDBC drivers support batches. Those that do are Rdb itself, Microsoft SQL Server and the companion SourceForge driver as well as the MySQL driver. Oracle as of 11.1 does not have the ability to return to the Loader the number of rows that were inserted or modified for each record in the batch. You will need to investigate the documentation for your driver to verify that it does support batches.
There is one Loader interface that supports asynchronous transfer and that is when the target is a Tuxedo application. These are relatively rare and if you are interested there is a good discussion in the documentation.
There is another solution that we found for a different customer. This customer required that their Rdb reporting database, again in California, be updated once per day from transactions done during the day in an another data center several thousand miles away. Again the transactions were big. Replication would not work because of the size and number of transactions and because of network latencies.
So, what was done in this case was to use RMU/UNLOAD/AFTER and extract the AIJs at the end of the day into an output file. This reduces the size by a very large amount as all index information is removed from the journal. This is then compressed and sent by asynchronous transfer to the California data center where it is decompressed. Transfer speed is increased because the asynchronous nature of the transfer allows many packets to be in flight concurrently without waiting for acknowledgement.
The Loader is run on the target machine in a hybrid mode called COPY mode. In COPY mode, a special copy process takes the place of the LogMiner in the Loader family tree. This COPY process then reads the file extracted by the LogMiner and writes it into the mailbox that the Loader uses for inter-process communication. Data is then read from the mailbox by a number of concurrent Loader thread processes just as if one were running the Loader against the production database. This process looks exactly like a normal real-time Loader replication.
This solution replaced a once-per-week transfer of physical tapes by a secure courier (human) who flew with them to California in a secure pouch. A major benefit was that more timely information was available from the reporting database in California.