Who in the World Needs a Hard Drive?
Why would you need a hard drive in a business intelligence implementation? While this question is something of an exaggeration, sometimes exaggeration is necessary to more fully understand your needs. After all, as Hamlet said, "I must be cruel, only to be kind."
Let’s look at a quick example. When a consumer needs to buy something, his or her priorities usually include considerations of speed (rapid access to services), flexibility (not wanting to know where the data is stored that the consumer needs to see or generate ), price (cost efficiency) and ease of use.
How do we superimpose these four criteria onto the world of business intelligence? First, we want to have access to the data stored in our computer systems and turn that data into useful information. This process should be fast, flexible, less expensive and accessible by the users at various levels.
One technology that fulfills these criteria is called in-memory analytics. (Memory is sometimes referred to as RAM, or random access memory.)
The typical BI questions that users ask are depicted in Figure 1. In-memory systems can effectively respond to these types of questions.
Disk Drive versus In-Memory
The first point to understand is the physical nature of disk drive storage. When we store our data on a disk drive, we retrieve, manipulate, calculate and format it, and then we store it again. Each time this cycle is repeated, we revisit the hard drive. The memory is physically separate from the hard drive. Consider the connection between the memory and the hard drive as a sort of “pipe.” Accessing the data stored on the hard drive through this pipe is called physical input/output, or I/O.
The physical I/O process is more time-consuming than “electronics” activity in memory. With the use of RAM, it would make little difference whether the query has to access 10 relational tables or 100. In the conventional memory and hard drive environment we manipulate, calculate and format data in the memory and then send it “out” again for storage. The process of storage and retrieval is a physical activity hampered by the movement of the “arm” of the hard drive. (It might be useful to open a broken hard drive and see the arm and the old phonograph type stack of records used for storing the data to appreciate the physical movement of the arm – ergo called “physical” I/O.)
This might cause you to wonder whether, at runtime, the report generation software is able to perform all the needed analytical functions. Can it perform data retrieval, data storage, manipulation, calculation, formatting and storage again within the memory of a 64-bit server (with much larger addressable memory space compared with a 32-bit server)?
Advantages of In-Memory Analytics
The answer lies in the primary advantages of in-memory analytics, which include the following:
Speed: It stands to reason that if network access or disk I/O were eliminated – or if querying were extensively minimized – reports from the data residing in the server’s memory would be generated significantly quicker.
For most BI applications, required data is stored either in back-end databases or in data warehouses (please refer to my article, “Who in the World Needs a Data Warehouse?”), typically using either relational or OLAP-based databases. Most dashboard applications generate large answer sets. (Please refer to my article, “Who in the World Uses only Words and Numbers in Reports?”) But regardless of the complexity and/or the size of the query – or the amount of data it will return – in-memory analytics enables faster retrieval of answers.
One reason for this is that in-memory analytics makes building indexes and cubes unnecessary. Further, there is no negative impact on the operational systems, as they don’t have to compete for disk drive resources unless they need more memory, too.
Additionally, to store the same data, less memory is required. This is possible because in a disk-based system, pre-calculation and pre-aggregation are done to reduce data access and data transfer. On a disk in a data warehouse, approximately one-third of the storage space is reserved for indexes, another third for aggregations and the leftover third for the real data. But because we are accessing the data electronically in-memory, there is no need to have aggregations. In RAM, one has the luxury of doing entire table scans in a blink.
Flexibility: Another significant advantage to in-memory analytics is that the reporting software can mix and match different RAMs from various data sources by “piggy backing” on cloud computing. (Please refer to my article, “Who in the World Doesn’t Want to reach for the Clouds?”)
Cost savings: Over the years, electronics have become more affordable. Memory has become a commodity item. The 32-bit architecture has evolved into the 64-bit architecture, providing much bigger memory stores. This enables the BI environment to emerge as a very affordable endeavor. In the not too distant future we should expect to have a 128-bit architecture memory, and it may go on progressing to 256-bit and further! With technology progress sky is the limit!
Reduction in IT maintenance: Because the OLAP requirements of disk-based BI systems are eliminated, in-memory analytics enable business analysts and line-of-business managers to build their own reports and dashboards, as well. BI systems require a lot of time for reconciling data size with improved response time. Much work goes into determining the caching space and management of caching across the application servers and database layer. However, users won’t need to have expertise in performance tuning because in-memory technology and the 64-bit architecture completes it for them. This translates into ease of use for the users!
Additionally, users don’t constantly need IT to generate reports. In-memory analytics can potentially eliminate the need for building and maintaining OLAP cubes for reporting. This eases the IT workload and drastically reduces data warehouse maintenance.
How Much Data Can be Analyzed In-Memory
A single 32-bit CPU can manage only two to three gigabytes of data in RAM. However, a 64-bit CPU can manage more than 100 gigabytes of data in RAM. Meanwhile, a quad CPU-based 64-bit operating system - with one terabyte of addressable RAM - can support an entire data warehouse (depending on its size) or a number of data marts.
For optimal performance, 20 percent of the total data to be analyzed should be in RAM. The size of the RAM is dependent on data compression rates of the software used. The ability to allocate RAM cache across the application servers, as well as across the database management system, has not yet been developed.
In the end, in-memory analytics provides faster service at a lower cost. With the successful development of this system, perhaps it now makes a bit more sense to ask, “Who in the world needs a hard drive?”
Editor's note: This is the eighth in a series of articles by Shaku Atre. Click on the titles to read the other recent articles: "Who in the World Wants to Stay Locked Up?"; "Who in the World Doesn't Want to Reach for the Clouds?"; "Who in the World Wouldn’t Want a Collaborative BI Architecture?"; "Who in the World Wants More Data?"; "Who in the World Needs a Data Warehouse?"; "Who in the World Wouldn’t Want to Evaluate BI Products?"; "Who in the World Uses Only Words and Numbers in Reports?"; and "Who in the World Wants to Just Be Structured?"