Learning the Wrong Lessons

I’m going to start with an assumption here, which I think is pretty reasonable, that when everyone learns to program, or to operate a system, they do so in an environment much simpler than one that is used for commercial-level work. There is of course real value in this experience: many of the concepts, once grasped in a simple environment, provide a firm foundation for more advanced concepts that you need later. The canonical example of this is the Hello, world! program. It doesn’t do anything useful, but to get those words on your screen, you have to have correctly installed and configured and run a basic toolchain or IDE: shell/desktop → editor → compiler → new program. But there are some bits of the toolchain that a professional developer uses every day, that you don’t need at all at this stage; probably you don’t need any sort of build tool nor do you need a debugger nor a profiler nor version control. The wrong lesson to learn at this stage would be that actually, these things aren’t ever needed. I have seen it happen, but in general, I don’t think this does happen much, people do grow into the rest of the toolchain as and when they need to and are ready for it. The first time you single-step through your own program in a debugger is like magic! The sysadmin aspect of this is, for example, you quickly learn how upgrading one package can break another – but not until you are ready to understand it, you don’t need to worry about that to get started.

When learning to program with databases, it’s a bit more complex because there are more prerequisites (e.g. you need to have a database to talk to, and you have to know how to include or at least refer to external libraries in your program and so on) but the same basic principle applies: a minimal system on which a Hello, world! equivalent can be written. One application, with one user, talking to one database, also with one user. You can create tables, INSERT and SELECT data, and at this stage, for the programs you write for yourself, you don’t need all that fancy stuff like stored procedures, triggers, referential integrity constraints and so on. And why would you? You’re writing the code yourself, so why not keep it all in the one file (no build tools or version control at this stage remember)? And if you want to run some code before every INSERT, why not just do it? But unfortunately something strange seems to happen here that doesn’t happen with compilers, and I don’t know why: this experience gets wrongly carried forward into the real world, and people insist on trying to do everything in their own code, and using the database only as a dumb datastore. You can perhaps get by without a debugger if you are willing to pepper your C code with #ifdef and printf() statements every other line, and it’s possible to do a big project without real version control. But let’s see how the database fallacy plays out in the real world.

In an enterprise setting, it is not at all unusual to have 25, 50 or 100 applications connecting to a single database, representing thousands of connected users. Some of those applications are mission-critical: without them, the business comes grinding to a halt and starts hemorrhaging money. Changing these is a big deal and requires the say-so of senior management. Some of those applications are as old as the organization itself; some were written before the developers now working on them were born (really!). They are probably written in a dozen or more languages. Even programs written in the same language a decade or two apart might as well be in different languages when dependencies, standards and coding styles are taken into consideration. Some are 100,000+ lines of code in hundreds of source files that over years of modification, aren’t as well structured as they once were, and pre-date widespread unit testing. Now let’s say we have some piece of business logic that we need to enforce across the organization. Maybe it’s a new regulatory requirement, maybe it’s a strategic shift in the business, maybe it’s something trivial seeming that is important to the CEO. So how do we go about doing this?

Change every application. This is a massive, massive undertaking, requiring people skilled in all the languages, familiar with the domain knowledge, developers, QA, budget for all these people, all their existing projects put on hold and the cost associated with that, business sign-off, risk management if there is some unforeseen interaction between the old code and new. That’s even if there is one single person in the organization who actually knows what all the applications are. Seeing who and what is connected right now doesn’t help – what about that batch job that runs once a week, or once a year? And you have to do it all again next time!
Implement a new business rules engine and modify every application to check with that before making any changes to the database. We have now got one central place in which to update our business logic! But we still need to change every application to now also connect to our new service (and over what protocol? COM / CORBA / RPC / some proprietary thing? The new service needs to speak them all fluently) and get its approval for any database operation, and new code for if that permission is refused. It needs to be as reliable and performant as the most mission-critical application. And it still doesn’t enforce anything, since all the applications still need to talk to the database anyway. Are you going to tell the CFO he can no longer use his beloved Excel® because you haven’t figured out how to write a plugin for it for the new service? Not to mention all the third-party apps that expect a standard database connection and don’t have an API.
Add integrity/check constraints to the underlying database, with more complex code implemented in the database’s stored procedure language. From this we get one central location to maintain and we get absolute enforcement of the rules even for applications that we don’t know about! We get one language to express the business rules in, throughout the entire application portfolio and lifecycle, since the language du jour changes far more often than the database. And it runs on a system that is already as mission-critical as the most important applications. Errors are handled by the existing code that handles database errors in those applications. There is still the risk that an application might break of course, but of the three scenarios, this is the least, and only the broken application needs any modification, not all of them (and most SP/database mechanisms will allow an exception to be made for one application if that is really, really necessary). Think this doesn’t matter in your greenfield site or small company? Well if your business succeeds, in 30 years time you will wish you had paid heed to my wisdom!

I don’t think I have said anything too far-fetched or controversial here – yet still there is huge resistance to working this way, option 3. Option 2 I see or hear about all the time – one household name IT vendor even used to recommend it! Until they added the capabilities you needed for 3, then they changed their minds. Very few people are crazy enough to attempt option 1. Some things I often hear:

It’s difficult to version control SP code deployed in the DB. I don’t think that’s any truer than saying it’s difficult to version control Java code deployed in an app server, which is to say, it isn’t difficult at all, it’s commonplace. And over in Ruby-land, entire books are written just about how to get your code from a development environment into production, something that no other language community seems to struggle with. Yet version controlling a stored procedure is apparently too difficult.
Stored procedures are hard to test. This is a strange one. For a start, SPs are strongly typed; the compiler will tell you if there’s a code path in or out that doesn’t make sense, and in Oracle at least, will calculate all the dependencies for you. So that’s one set of common unit tests you might need in Ruby eliminated off the bat. To test OO code requires mocking to coerce the object into the internal state required to represent the test scenario – how is setting up test data any different? There is a TAP producer for PL/SQL and other tools besides. There are debuggers and profilers too.
A stored procedure language is not a fully-featured language. Well, we aren’t trying to write an entire application just in stored procedures! Most dedicated SP languages have all the modern constructs that you would expect, and in Oracle at least, you can use Java Stored Procedures with all the language features OO developers are familiar with, or external procedures in any language. What matters is where the logic is implemented – in one place, close to the data – the actual language is just a detail. PL/SQL compiles to native code and runs in-process with the database; there is no higher-performance architecture than that.
I don’t want to have to learn another language. Overlooking for a second this is a huge red flag in any developer (especially one that proposes modifying production apps which might be in other languages anyway!) there is a lot to learn to work in any modern environment: a typical Java shop might have Eclipse, WebLogic, Maven, Hudson, Anthill, Subversion, and a whole plethora of others, that you need to learn before writing a single line of application code. A working knowledge of a very high level SP language is straightforward in comparison, and there will more than likely be a specialist or a DBA around to help you too. Not to mention that developer favourite Hibernate comes with its own query language…

Speaking of Hibernate leads me onto ORMs in general, another attempt to avoid writing SQL at any cost. Some say that the use of ORM is necessary due to the impedance mismatch tho’ in my experience this is only an issue when developing in a particularly dogmatic OO style. This comes back to my earlier point: a real enterprise has hundreds of applications, and is writing new ones all the time. A normalized relational database doesn’t paint you into any corners – any query that you haven’t even thought of today can be executed against the data, because the underlying maths is sound. If you have “persisted” “objects” into the database, then you will have to deal with true impedance mismatch: any application that doesn’t use the same ORM layer in the same language can’t see your data, at least not without jumping through hoops – at best you have taken on a burden of technical debt and at worst you have frozen your organization in time technologically. Real programs mix procedural, OO and functional styles as appropriate; this is simply a non-issue for any competent, experienced programmer.

And that’s before we come onto the notorious performance problems of ORMs: it doesn’t know when it is retrieving your objects that you only actually wanted 1 of 20 columns in the underlying table, and that you were planning to sum them – something the DB could have done for you and sent only the result over the network – so it grabs them all, then allocates a full-fledged object for each of them in-memory. Hibernate is particularly bad at joins; it would rather query two whole tables and stitch them together itself – and the developer thinks the database is slow! If your GPS sends you on a roundabout route and you’re late do you say “my car is too slow”?

So what is really happening here? Well, I believe that it just boils down to an old-fashioned demarcation dispute (and two can play at that game – but I prefer to identify with the overall development effort, not a language tribe). For the ubiquitous scenario I have described there is a very clear “right way” to do it, at both technical and business levels, just as no serious developer now would argue that the GOTO statement is the right way to program, nor that we should be writing applications in assembly language. SQL, referential integrity constraints, triggers and stored procedures are powerful tools; if you are a developer and improvising your own solution, you are taking on huge technical debt, perhaps without even realizing. If you are a manager and your developers won’t use them, you might ask why you are paying them to reinvent a wheel that Oracle has had since 1992 (version 7), and IBM long before, and making it square as they do so…

Learning the Wrong Lessons

About Gaius

4 Responses to Learning the Wrong Lessons

Leave a comment Cancel reply

Subscribe via RSS

Follow me on Twitter

Recent Posts

Archives

Learning the Wrong Lessons

Share this:

Related

About Gaius

4 Responses to Learning the Wrong Lessons

Leave a comment Cancel reply

Subscribe via RSS

Follow me on Twitter

Recent Posts

Archives