How To Run Your Own Apps on RAC

RAC, or Real Application Clusters, is Oracle’s proprietary clustering solution, for highly-available databases. Let’s break down the name. The “real” I assume refers to the fact that it is active-active; all nodes in the cluster are available to do useful work, in contrast with active-passive systems such as Microsoft’s where one node is idle, awaiting failure of the first before taking over its services. The “application” is the interesting bit; altho’ most implementations I have seen only use RAC for the database (and code running inside it) it can easily be used as a general-purpose failover clustering solution for any in-house or third-party code, saving the administrative overhead of having one type of cluster for the database and another for the applications. Just run the whole lot on your RAC!

The key to this is, to borrow the VCS terminology with which most people are familiar, the agent script. This acts as a proxy between RAC (or CRS, really) and your own code, allowing the cluster to start and stop it, and to check it’s health†. This is very similar to the scripts found on typical Unix systems in /etc/init.d/ that are invoked when changing runlevels – it takes the action to perform as the first parameter, and the body of the action can be any reasonable Unix commands. These must be sufficient to execute the program with no prerequisites, e.g. setting up LD_LIBRARY_PATH, the ORACLE_HOME, and so forth, and checking that the environment is sane, e.g. that necessary directories exist and are writeable. This script should live on a filesystem shared by all RAC nodes, let’s assume it is mounted as /common and the agent is called myapp.sh.

#!/bin/bash

# a simple CRS agent script
#
# 19-OCT-2011   Gaius       Initial version

# set up the environment - on each node, /home/oracle/this-node.env is a
# symlink to the environment variables (e.g. ORACLE_SID for the instance,
# PATH, LD_LIBRARY_PATH etc)
. /home/oracle/this-node.env

# check that the environment is sane, e.g. we can write to the log dir
# CRS looks at the exit code of this script to see if the operation was
# a success
if [ ! -w /common/log ]; then
    exit 1
fi

# parse the command line to see what CRS wants to do
case $1 in
'start')
    # do any prior cleanup, then start myapp and store its PID
    mv /common/log/myapp.log{,.old}
    myapp >/common/log/myapp.log 2>&1 &
    EXITCODE=$?
    echo $! >/common/pids/myapp.pid
    ;;
'stop')
    kill `cat /common/pids/myapp.pid`
    EXITCODE=$?
    ;;
'check')
    # check that 1 process named myapp is running - CRS will
    # automatically do this check on the correct node every 60s (default)
    # and if it returns non-zero take corrective action
    NUMPROCS=`ps -ef|awk '/[m]yapp/ {X += 1} END {print X}'`
    if [ $NUMPROCS -eq 1]; then
        EXITCODE=0
    else
        EXITCODE=1
    fi
    ;;
*)
    echo "Usage: $0 [start|stop|check]"
    ;;
esac

exit $EXITCODE
# End of file

Next on each RAC node, put in a symlink $CRS_HOME/crs/script/myapp.sh → /common/myapp.sh. This ensures that any node can execute the script, but there is only a single copy of it to maintain. Make sure it is executable with chmod. This can be tested on each node by calling it manually in the shell with the parameters and seeing what it does. Next, we register the agent script with the cluster:

$ crs_profile -create myapp -a $CRS_HOME/crs/script/myapp.sh -t application
$ crs_register myapp

This creates a cluster resource called myapp, with an agent script defined by -a, of a type application.

Now, we can start to manipulate our own program with the standard Oracle commands:

$ crs_start myapp
Attempting to start `myapp` on member `oel1`
Start of `myapp` on member `oel1` succeeded.
$ crs_stat myapp
NAME=myapp
TYPE=application
TARGET=ONLINE
STATE=ONLINE on oel1
$ crs_stop myapp
Attempting to stop `myapp` on member `oel1`
Stop of `myapp` on member `oel1` succeeded.
$ crs_stat myapp
NAME=myapp
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE

You can see this is running on my Oracle Enterprise Linux test system rather than my usual Debian.

This is a trivial example not suitable for real Production usage, for example, the check action should do more than just verify the process exists (it could be stuck) and the stop should try a clean shutdown, then a hard kill if that does not complete within a certain threshold, and all steps should write comprehensive logging to enable quick troubleshooting (you can see CRS’s own log in $CRS_HOME/log/`hostname`/crsd/crsd.log). Nevertheless it serves to demonstrate how simple it is to HA your own applications (assuming you have RAC already!), and a basis for further development. I have been using this technique in Production for several years now for a variety of purposes.

The official documentation is here. If your application is a server itself, it will also need a VIP of its own, so clients can connect whichever RAC node it is on. A very useful parameter for crs_profile in this case is -r, to make the application depend on its VIP, so the cluster knows to start them in the correct order, on the same node. There are many options viewable with crs_profile -help, including check interval, number of restart attempts, and so on.

† Alert VCS operators will have noted the absence of the “clean” action. You could do this in stop, or in start before actually starting

About Gaius

Jus' a good ol' boy, never meanin' no harm
This entry was posted in Linux, Oracle. Bookmark the permalink.

1 Response to How To Run Your Own Apps on RAC

Leave a comment