RAC, or Real Application Clusters, is Oracle’s proprietary clustering solution, for highly-available databases. Let’s break down the name. The “real” I assume refers to the fact that it is active-active; all nodes in the cluster are available to do useful work, in contrast with active-passive systems such as Microsoft’s where one node is idle, awaiting failure of the first before taking over its services. The “application” is the interesting bit; altho’ most implementations I have seen only use RAC for the database (and code running inside it) it can easily be used as a general-purpose failover clustering solution for any in-house or third-party code, saving the administrative overhead of having one type of cluster for the database and another for the applications. Just run the whole lot on your RAC!
The key to this is, to borrow the VCS terminology with which most people are familiar, the agent script. This acts as a proxy between RAC (or CRS, really) and your own code, allowing the cluster to start and stop it, and to check it’s health†. This is very similar to the scripts found on typical Unix systems in /etc/init.d/
that are invoked when changing runlevels – it takes the action to perform as the first parameter, and the body of the action can be any reasonable Unix commands. These must be sufficient to execute the program with no prerequisites, e.g. setting up LD_LIBRARY_PATH
, the ORACLE_HOME
, and so forth, and checking that the environment is sane, e.g. that necessary directories exist and are writeable. This script should live on a filesystem shared by all RAC nodes, let’s assume it is mounted as /common
and the agent is called myapp.sh
.
#!/bin/bash # a simple CRS agent script # # 19-OCT-2011 Gaius Initial version # set up the environment - on each node, /home/oracle/this-node.env is a # symlink to the environment variables (e.g. ORACLE_SID for the instance, # PATH, LD_LIBRARY_PATH etc) . /home/oracle/this-node.env # check that the environment is sane, e.g. we can write to the log dir # CRS looks at the exit code of this script to see if the operation was # a success if [ ! -w /common/log ]; then exit 1 fi # parse the command line to see what CRS wants to do case $1 in 'start') # do any prior cleanup, then start myapp and store its PID mv /common/log/myapp.log{,.old} myapp >/common/log/myapp.log 2>&1 & EXITCODE=$? echo $! >/common/pids/myapp.pid ;; 'stop') kill `cat /common/pids/myapp.pid` EXITCODE=$? ;; 'check') # check that 1 process named myapp is running - CRS will # automatically do this check on the correct node every 60s (default) # and if it returns non-zero take corrective action NUMPROCS=`ps -ef|awk '/[m]yapp/ {X += 1} END {print X}'` if [ $NUMPROCS -eq 1]; then EXITCODE=0 else EXITCODE=1 fi ;; *) echo "Usage: $0 [start|stop|check]" ;; esac exit $EXITCODE # End of file
Next on each RAC node, put in a symlink $CRS_HOME/crs/script/myapp.sh → /common/myapp.sh
. This ensures that any node can execute the script, but there is only a single copy of it to maintain. Make sure it is executable with chmod
. This can be tested on each node by calling it manually in the shell with the parameters and seeing what it does. Next, we register the agent script with the cluster:
$ crs_profile -create myapp -a $CRS_HOME/crs/script/myapp.sh -t application $ crs_register myapp
This creates a cluster resource called myapp
, with an agent script defined by -a
, of a type application
.
Now, we can start to manipulate our own program with the standard Oracle commands:
$ crs_start myapp Attempting to start `myapp` on member `oel1` Start of `myapp` on member `oel1` succeeded. $ crs_stat myapp NAME=myapp TYPE=application TARGET=ONLINE STATE=ONLINE on oel1 $ crs_stop myapp Attempting to stop `myapp` on member `oel1` Stop of `myapp` on member `oel1` succeeded. $ crs_stat myapp NAME=myapp TYPE=application TARGET=OFFLINE STATE=OFFLINE
You can see this is running on my Oracle Enterprise Linux test system rather than my usual Debian.
This is a trivial example not suitable for real Production usage, for example, the check
action should do more than just verify the process exists (it could be stuck) and the stop
should try a clean shutdown, then a hard kill if that does not complete within a certain threshold, and all steps should write comprehensive logging to enable quick troubleshooting (you can see CRS’s own log in $CRS_HOME/log/`hostname`/crsd/crsd.log
). Nevertheless it serves to demonstrate how simple it is to HA your own applications (assuming you have RAC already!), and a basis for further development. I have been using this technique in Production for several years now for a variety of purposes.
The official documentation is here. If your application is a server itself, it will also need a VIP of its own, so clients can connect whichever RAC node it is on. A very useful parameter for crs_profile
in this case is -r
, to make the application depend on its VIP, so the cluster knows to start them in the correct order, on the same node. There are many options viewable with crs_profile -help
, including check interval, number of restart attempts, and so on.
† Alert VCS operators will have noted the absence of the “clean” action. You could do this in stop, or in start before actually starting
Discussion on LinkedIn