Big Data Is All Relative—or Relational
It is probably an understatement to say that the tech world is obsessed with Big Data—tons of articles are generated each week about automated computational analytics. You certainly can understand the fervor, given the game-changing insights that will impact business, consumers and society that Big Data initiatives will bestow on us.
But as we foam at the mouth over the next great revelation that will emerge from the Hadoop cluster, a new wave of cloud-enabled applications are testing the limits of our traditional relational database systems. SQL-based databases are buckling under a tidal wave of tens of millions of good ole-fashion ACID transactions.
But as tech pundits yawn at the RDBMS market for its lack of an automated-analysis angle, peel back the onion and you will find that this inability to handle heavy OLTP activity is tripping up consumers under the brightest of the business world’s spotlights.
High-profile RDBMS failures
Remember Amazon Prime Day? It didn’t get off to an auspicious start when a multitude of shoppers received an "Add to cart failed... Retrying" message upon checkout. Similarly, this summer servers initially crashed when throngs of gamers rushed to download Pokemon Go. We see the same thing each year on Black Friday and Cyber Monday. While these companies have made astronomical profits, website failures cost them millions more dollars that consumers were willing to spend.
Mobile and cloud confound the big-box RDBMS
How is it that relational databases are struggling with such massive workloads? Conventional thinking says that Oracle, IBM, Informix and other large database vendors solved this problem long ago. For one, the workloads are increasing by factors many might not have thought possible a decade ago thanks to the proliferation of mobile devices. Now, gamers are playing each other on the go, shoppers are buying goods while riding the bus to work and marketers are sifting through way more social media data than ever to report on basic user activity in real time.
An equally important factor: the emergence of the cloud has introduced a new blind spot. In a previous era, companies that reached the limits with the cheaper MySQL database would simply upgrade to the aforementioned vendors, all of which excelled at these transaction volumes in the data center. Fortunately for the database customers, the need for these expensive databases usually signaled that revenues were growing at a pace where they could more than offset the higher cost.
Today, however, up-and-coming retailers and social commerce brands utilize the cloud in the early stages of their development to save on data center, application development and computing costs. Once they hit the big time, big-box vendors’ data-center-oriented products are often not practical in AWS, Google, Azure, Rackspace or other major clouds due to issues of portability and often compatibility. Besides, these same budding stars aren’t exactly eager to give up the cloud’s flexibility to sign a long-term contract anyway. It’s often against the ethos of many “born-in-the-cloud” companies, and runs counter to the flexible business models that have fueled their rapid growth in the first place. Furthermore, it’s very, very expensive.
The cloud—it’s not as easy as adding instances
That brings us to another misconception: running your application on the cloud does not make it “automatically scalable.” Yes, it’s much easier to scale hardware and network capacity when needed, but that’s not necessarily the case with the database layer. Although next-generation database vendors, including AWS Aurora, may work for most of the use cases that exist today, the fundamental single-server design of most of these modern alternatives all but guarantees site hiccups when trying to process 40 million concurrent write transactions taking place on the aforementioned mobile devices.
No matter how many instances you add, even the largest server will reach its limit, thus causing site latency or even outright failure. Avoiding such a fate under the duress of such heavy web traffic entails complex technical workarounds that require the time, expertise and cost of specialized database administrators, which is the very thing most companies are trying to avoid by turning to the cloud in the first place.
As more companies hit the big time, we will see that even their cloud-based infrastructures will initially fail to accommodate their rapid success. For example, new superstar online marketplaces serving emerging countries or niche industries will experience growing pains if they can’t achieve the highest database performance in the cloud as bids get placed from multitudes of smartphones and tablets.
Marketers that reach the Holy Grail of a successful viral campaign will find that their cloud database service is ill prepared for their genius. Those that can successfully leverage Facebook, Twitter and other social networks to generate popular contests, giveaways and other prize-incentived promotions will leave many contestants frustrated by the failure to process their registrations from iPhones and Android devices. Advertisers who put the right ads in front of a million consumers at the right moment in time will be crestfallen when their employers or clients can’t monetize the demand they generate.
The relational database’s data avalanche only promises to grow
RDBMSs are going to see greater tests in the near future as more and more consumers execute their transactions from handheld devices—mobile usage has now overtaken the desktop 51 to 42 percent, according to Kleiner Perkins Caufield and Byers. And those folks are getting less patient; according to CPC Strategy, almost half of all consumers now expect a web page to load in two seconds or less, and 27 percent of cart abandonment occurs due to time constraints.
No, relational databases still won’t be counted on to synthesize a massive stream of telemetry feeds, social posts and other mountains of semi-structured and unstructured data into tremendous discovery, but they are going to be counted on to handle “Big-Data” workloads of a different sort moving forward. Internet-scale applications are going to require a database architecture that delivers the performance of OracleRAC or DB2 outside of the data center.
Our global economy will likely see the rapid explosion of new companies, mobile apps and viral marketing phenomena, and if these entities can’t process the millions of simultaneous cart checkouts, fund transfers and/or new registrants in the cloud they won’t be able to handle their own rapid success.
Many emerging “Internet-scale” players—particularly in the international market—are dealing with this by rethinking their database model. No longer are they faced with a binary choice. Rather than choosing solely between MySQL or a pricey enterprise vendor, they’re turning to what is collectively referred to as “NewSQL” databases designed to retain the familiarity and data integrity of SQL, yet scale beyond the limits of a single server.
Some of these options are drop-in, MySQL-compatible replacements, and many of them are integrating in-memory computing. These attributes make it easy to migrate while providing a level of processing speed that is typically associated with the previously mentioned Big Data applications and analytics. This not only solves the scaling and performance issues, but also opens the door to a host of real-time reporting options on transactional data.
(About the author: Michael Azevedo is chief executive officer at Clustrix, Inc.)