ML in the Real World

About a decade ago now, I was doing a lot of what we would now call ML†, using the what is now called data exhaust‡ from the production infrastructure of an exchange, both the OLTP and DW sides. It was simple timeseries stuff, just lots of it. I could look at the storage arrays, say, and make very accurate predictions about when some threshold would be breached, very far in advance. I could get from the ticketing system when a purchase order for more capacity was raised, and when it was fitted, and say exactly when to place the order with the vendor to get the parts delivered on time. Same with the time taken to fetch a tape from offsite. I looked at batch job completion time vs CPUs, not only did I know well in advance when we would need more, my algos had worked out for themselves that there were periodic spikes such as end-of-month reporting, and knew that there was no need to alert. All sorts of stuff like this, I thought it was pretty clever and I was quite pleased with myself.

In practice tho’, no-one cared. We went on ordering more disks and shelves when the dumb Nagios alert fired, so long as it could be added before there was an actual, during-trading-hours outage, that was good enough so why change? We added more CPUs when the moaning of the analysts reached the ears of the CEO and he in turn moaned to our boss about it, there was no formal SLA on job completions. And everyone who had been there 6 months or more knew which alerts to ignore and simply did that, no-one even bothered to blackout them (also, because there was nothing that could be done about them anyway).

I had a lot of fun doing all this, and I learnt a lot, this was the time I got seriously into Python, NumPy, Matplotlib and so on, skills that have served me well ever since, and applied linear regression, PCA, and various other techniques, to real data. But the real lesson is, if you’re going to try to use ML in the real world, you have to use it to solve a problem that you actually have, and generally, existing problems already have a solution that is good enough that ML doesn’t tell anyone anything they hadn’t already figured out themselves or was already embedded in institutional knowledge. Maybe if we didn’t already have industry-leading uptime and transaction volumes on human intuition alone, it might have been taken more seriously. I think many if not most ML practitioners are going to run into this scenario at some point, and need to have a story ready, which I didn’t.

† It was just called applied or predictive statistics back then
‡ It was just called metrics back then, or logging, gotta keep up with the buzzwords!

About Gaius

Jus' a good ol' boy, never meanin' no harm
This entry was posted in AI, data science, Python, Random thoughts and tagged , , . Bookmark the permalink.

Leave a comment