ABSTRACT

Computer memories continue to serve the role that they first served in the electronic discrete variable automatic computer (EDVAC) machine documented by John von Neumann, namely that of supplying instructions and operands for calculations in a timely manner.

As technology has made possible significantly larger and faster machines with multiple processors, the relative distance in processor cycles of this memory has increased considerably. Microarchitectural techniques have evolved to share this memory across everlarger systems of processors with deep cache hierarchies and have managed to hide this latency for many applications, but are proving to be expensive and energy-inefficient for newer types of problems working on massive amounts of data.

New paradigms include scale-out systems distributed across hundreds and even thousands of nodes, in-memory databases that keep data in memory much longer than the duration of a single task, and near-data computation, where some of the computation is off-loaded to the location of the data to avoid wasting energy in the movement of data. This paper provides a historical perspective on the evolution of memory architecture, and suggests that the requirements of new problems and new applications are likely to fundamentally change processor and system architecture away from the currently established von Neumann model.

INTRODUCTION

The central processing unit (CPU) of a computer is often considered the heart of the computer, the part of the computer through which data are streamed and transformed in the course of a calculation. As such it is also given pride of place, with the arithmetic power of a system often being used as a metric of performance of the system.

However, the computational capability of a system is dependent not just on the number of calculations that it can perform in a second, but also on the capability and efficiency of staging data to these calculations. These data include not only the input data and parameters needed to solve the problem but also auxiliary tables that need to be referenced, as well as data produced in the course of calculations that need to be coursed back through the computational units.

There is another important item needed in completing calculations, and that is the recipe precisely describing the calculations needed, their order, and their dependence on the results of previous calculations. This recipe for the successful completion of a task, or program, is specified as a sequence of steps called instructions, typically saved in some external medium and brought into the computer along with other items of input data when the program needs to be executed.

All this information needed by a computation is today brought into a central place called the memory of the computer, much as in the early electronic discrete variable automatic computer (EDVAC) [1]. Access to instructions and data must be fast, so that the expensive calculating units are well utilized during the course of the program.

Indeed, in the early days of computing, access to this memory was as fast, if not faster, as the rate of performing the actual calculations. But as computers became faster and as the size of problems grew both in input size and in the complexity of instructions, access to memory became steadily slower than the rate of computations. This led to proposals, such as data-flow architectures [2], where the recipe for the calculation was not a sequential set of steps and where the data were not centralized, but rather were distributed in parallel among the computational elements of the computer.

Such proposals did not succeed commercially, as computer engineers found ways of predicting the instructions and data that the computational elements would need, and located them appropriately in time for them to be processed on the various units of the machine. Thus, today’s computers supplement the main memory of the computer with a set of local registers and one or more levels of cache in a memory hierarchy, where the cache level closest to the calculating engine is small and fast, but contains only a subset of the contents of memory.

The resulting machines have become quite sophisticated -- and complex -- and the community is once again asking whether it is efficient to keep data in a central location and move it to processing units which keep getting further away. Cost and energy have not been serious concerns as long as Moore’s law [3] kept giving us steadily increasing transistor densities and as long as Dennard scaling [4] kept power requirements low.

With the end of Dennard scaling, the focus of concern is shifting to the energy expended in moving data back and forth from memory shared by multiple processors across large systems. Thus, there is a trend toward partitioning the memory across distinct compute nodes and performing multiple tasks in parallel, analogous to the old data-flow proposals. This paradigm appears to particularly suit the computation needs of workloads dealing with large volumes of unstructured data.

This paper will describe this new trend and examine specifically the evolution of the role of main memory. The organization of the paper is as follows. Section II presents a brief history to understand how the organization of the computer evolved to the form it is today. Section III describes the memory hierarchy developed by processor designers to combat the problem of increasing memory latency. Section IV discusses memory reliability, while Section V introduces storage-class memories which bridge the gap between DRAM and disks. This is then followed in Section VI by a description of the change in the volume and nature of data and the changing role of data in the modern world. Section VII describes near-data computation, the trend toward offloading computation to processing units nearer the data, while Section VIII describes how advances in 3-D technology are reviving interest in processing-inmemory. Section IX argues that dense memories in new technologies are likely to be exploited better when applications tolerate approximate solutions. Section X examines the possibilities of integrating memories and computing more tightly in a post-von Neumann world.

To be continued...

(Note: This article appears courtesy of the Proceedings of the IEEE)

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access