With kudos to new takes on software and data optimization techniques (like virtualization, compression and columnar storage) the buzz coming from “big data” can take most of its credit from hardware advancements, notably 64-bit CPU, multicore processing, server parallelism and storage in new formats like solid-state disk and flash.
Where CPUs held the performance high ground, storage (and the time, money and labor it takes to read, write, manage and move data) had become the engineer’s wall of throughput.
But now that barrier is crumbling, and was the topic of DM Radio last week where host Eric Kavanagh called a panel of experts to discuss how RAM and main memory leaps are causing enterprises to reconsider the way they transact, integrate and analyze data going forward. It’s something we all need to pay some attention to because it’s evolving very quickly.
In fact, said Rogert Gaskell of Kognitio, removing the disk bottleneck has put the burden back on parallel CPUs in ad hoc or unpredictable environments and need to be processed as quickly as possible.
Commoditization, the effect of units of things becoming very inexpensive and interchangeable, is one contributor to this change. Even though it still tests on lab benches, solid state disk storage is also the backbone of a new inexpensive big data solution from Amazon Web Services released just last week. That couldn’t have happened economically even a year ago.
In the late 90s it cost about $600 to add 4 MB of RAM to my hotshot 486 PC. That was the equivalent amount of data for a song or two on my MP3 player. Today, companies hand out multi gigabyte flash drives like candy. Memory is no longer a step-up luxury item. In fact, you can expect a tricky time for corporate buyers: even as they save increasing amounts of money; it becomes harder to do more than dip their toes into pain points for the time being. The opportunities are growing, but so is the chance for buyer's remorse for private infrastructure that's going to fall to lesser uses before too long. If you have a database that's twice too big to fit in memory right now, data legend Michael Stonebraker said on our show, "just wait two years" and it affordably will.
“We’re seeing a bunch of startups geared on main memory that are on average 50 times faster than disk systems,” said Stonebraker, most lately a founder at VoltDB. He also says we can look to open source as a wave of the future that will bring “wildly cheaper” pricing to in-memory.
On the transactional side, OLTP is already destined to live in main memory, Stonebraker said. “OLTP is used when you click 'buy it now' on eBay or transact for an airline ticket. Since it does represent a business transaction, OLTP does need to be bulletproof, and ACID compliant. It is a SQL ACID market exclusively.”
Main memory right now does have a downside in that it requires redundant hardware and replication to ensure data is not lost when power fails in a flood or earthquake. For now that is a cost of doing business. When and if non-volatile RAM comers available it will remove the need to write redundantly to disk.
“It’s not going to be too many years before non-volatile RAM is available cheaper than flash memory which ends the need to persist databases at the risk of power failures or system crashes,” said Stonebraker. “And, remember, if you run 50 times faster, your power consumption just went down 50 times too.”
This is where the economics of data management take another turn. Todd Walter of Teradata said there is a long tail of data with diminishing value that becomes less valuable the less it is used. So, we’ll see hardware tiering of multiple disk and solid state and flash devices going forward. The problem there is, multilevel platforms are very manual to manage. “We think the key to make it happen is automatic management to place data in the right tier based on access patterns and the value of the data.”
That’s what he says Teradata is working on now, the ability to support many types of storage from the slowest disk to RAM and manage it without managing platform to platform with no humans involved.
This raises another question about the future of the utility computing model also. With data it seems we always overflow our closets no matter how many we build, but complexity could affect how commoditized data can become. And if data is an asset, we may decide we want much of it nearby.
That’s a lot easier to do when the closets we are building are virtual, the data is highly concentrated and doesn’t come with the same brick and mortar burden of laying foundations for whole new data centers.