This is the third segment of my series of posts on interesting new features in Oracle 12c. In this I will cover enhancements to RAC, DataGuard (DG) and RMAN.
Application Continuity (AC) is the next logical step in Oracle’s existing Transparent Application Failover capability. TAF works as follows: you start a big
SELECT on one node of a RAC, and midway through, the node fails. The TAF-aware driver reconnects to another node and re-submits the select, discarding the rows that it has already retrieved, and finally delivers the complete result set to the application which, other than it taking a bit longer due to actually starting again, should not be aware that anything has happened, hence “transparent”. However when performing DML TAF is less capable, it is up to the application to catch the exception and retry the transaction, on a connection which has been reestablished by the driver. AC encapsulates much of this logic, enabling on reconnection a sessions’s state to be reproduced, and its work resumed or retried. However it requires the use of WebLogic or Universal Connection Pool – it is basically a transaction processing monitor like CICS. It’s great that we have this capability natively in the Oracle stack now, but I wish this was implemented in OCI and happened “for free” in any application that was simply recompiled against the 12c libraries, perhaps that will be possible in a later version. AC is built on top of an enabling technology called Transaction Guard (TG), which is in OCI. TG gives each transaction a unique identifier, which can be used to reliably determine the commit state of that transaction even following a database outage, as it is stored in the session in OCI. This means there is no risk for a TG aware application duplicating the same work – and even more cleverly, that there is no risk of another session doing some work that would make a subsequent attempt impossible (e.g. if a customer has bought the last item that we had in stock). For the DBA there is a new table
LTXID_TRANS in the
SYSAUX tablespace for viewing TG activity, and to enable the feature there is some additional configuration to do. I am going to see about incorporating this into OCI*ML eventually.
The next interesting feature is Global Data Services (GDS), an intelligent load balancing and connection routing layer, for use in replication scenarios. Sometimes replication is used for resilience, and sometimes it is used to improve performance, either offloading read-only work from a read-write database, or to make available a copy of the data locally in a remote location, to cut down on the latency of round trips (and with the added bonus that that site can continue to work even if losing connectivity). GDS works with both physical replicas (Active DataGuard) or logical (GoldenGate) – indeed as mentioned in part 1 these two products are now licensed together. At a previous employer, we had a very sophisticated replication topology implemented on Oracle 10g with Quest Shareplex, and I know a few old hands that still swear that Sybase Replication Server is the best thing since sliced bread. Oracle’s own story in this space has been a bit mixed; GG was an acquisition which deprecated Streams which in turn obsoleted Logical Standbys. Perhaps now there will be a solid platform on which we can build going forwards, with GDS, DG and GG, certainly the licensing gives the customer more flexibility.
Continuing with DG, there are a slew of new features here, including Far Sync (FS), the
DBMS_ROLLING package for leveraging DG in upgrade scenarios, writeable temporary tables in read-only active dataguard DBs, unique sequences across a DG topology and automatic validation of readiness for DG role transitions. FS is designed for the “zero data loss” scenario. To meet this requirement, typically we would deploy across two, geographically dispersed datacentres, call them A and B. When a transaction is committed at A, it is replicated to B, when B acknowledges the commit, A commits and the call returns to the application. The latency on this system can quickly add up, double the disk I/O time plus the network roundtrip, plus a small computational overhead. Craig “The Hammer” Shallahamer has shown in his work on queuing theory how even a small increase in response time can massively impact a system’s throughput. FS therefore is a proxy that sits between databases A and B, in the same datacentre as B. Now the process is a transaction is committed at A, it is replicated to FS which immediately acknowledges it so that A can proceed as normal, and then passes it on to B which will then perform its disk I/O and commit it there, thus reducing overall response time while still getting the data safely away. I don’t fully understand yet how FS is related to or interacts with the fast start failover observer, if at all, but this is something I will study before attempting to deploy!
The next DG enhancement is the
DBMS_ROLLING (DR) package, which as the name suggests is used for performing rolling upgrades, which is the term describing the approach of upgrading a group of databases that together offer a service one at a time, while the others maintain the service for the users. I have done this before in RAC by taking one node at a time out of service, and it was always possible to do manually with an Active DG configuration, but now the tools are provided to make it safer and faster. You designate one or more databases in the configuration as the “future primary” and its immediate standbys (leading group), this is the one upgraded and when everything is OK, the future primary takes over and the current primary (part of the trailing group) becomes the next upgrade target. As good DBA practice I like to cue up all the commands I will need in a spreadsheet or in scripts, having tested them procedure with VMs and had it “code reviewed” by another DBA, so that on the day to upgrade production, I can just execute the plan – DR builds this plan in a DB table. It looks like it will be a great tool for production DBAs, however it will be a long time before we can use it, not until 12.2 is out I expect, since you need to be on 12.1 to use it, and you need something to upgrade to… :-) Just as TG enables AC, DR is enabled by some underlying code that pre-validates DG role changes, similar to the way VCS assesses a node to see it if is a viable failover target. Again, something you can do yourself (the “one button failover” control panel I developed in my current job does this with a nice Bootstrap interface) but it is nice that there is an “official” solution now, it frees up the DBA to concentrate on value-adding tasks.
The most interesting enhancement to RMAN is the ability to recover a single table now. It is normal in a well-run DB to have several layers protecting the data. There are regular RMAN backups of the datafiles and more frequently of the archived redo logs enabling a database or a tablespace to be recovered to any point in time. But in the event that a user requires a single table back (and it can’t be gotten via Flashback or a lagging standby) this has a significant overhead – the DB must be restored to a different system, with sufficient storage, then the table can be copied over a DBLINK or via DataPump. This is a tedious process that doesn’t offer great turnaround times to the user, so many experienced DBAs also like to schedule regular logical backups using DataPump or previously
exp. This has been normal practice for as long as I have been a DBA! At the cost of some – perhaps a lot of – extra disk space, this makes retrieving a table as it existed at the time of the export very easy, but it is more stuff to manage. The new RMAN merges these capabilities meaning that it can be done with a single tool, this is something that will make the DBAs life easier and make the end user happier too.
One final thing I would like to mention is the new ability to move a datafile online. We have been able to move a table from one tablespace to another (and hence from one or more datafiles, to different ones) but the process of moving a datafile has meant going offline however briefly. My usual approach has been to use RMAN to make a copy in the new location, set the original to read-only, recover the copy to apply changes made since the copy started, then switch them over with RMAN. This is a much better way than offlining the tablespace, moving the datafiles, renaming and re-onlining as it minimizes the unavailability. With ASM, there is a technique for moving an entire diskgroup from one set of LUNs to another online by exploiting a behavior of the rebalance operation which I have used when upgrading storage hardware. But now we can do it online with the
ALTER DATABASE MOVE DATAFILE statement for an individual datafile on any filesystem. This is a feature that is genuinely new in the sense that it wasn’t possible to do it at all previously to 12c.
In part 4, performance and tuning enhancements.