In my September 2001 column, I offered that data architecture was an orderly arrangement of parts to do four things: organize, store, access and move data. In this column, I want to look at this subject from a different perspective: the vantage point of the systems development life cycle (SDLC).

This exercise is important because people too rarely think comprehensively about the data-focused deliverables that a data architect needs to worry about. The documentation and activities are diverse and scattered, and few people bring them together and think about them as a set. Generally, a team's perspective is application-oriented; and unless we think more systematically, data architecture thinking risks becoming ad hoc and uncoordinated.

Because everyone has a favorite SDLC, I won't discuss the details. Instead, I'll share my 24 years of experience with Accenture, where we have always used a simple phasing of planning, analysis/design and development. By looking at data architecture across these phases, I want to shed some light on its unique requirements.

Figure 1: Data Architecture Phases

In the planning phase, the data architect is concerned only with conceptual designs: subject-area data models, the choice of a database management system, the matching of subject areas to processes and the geographic distribution of subject areas. There is also a focus on approaches to data distribution: replication, synchronization, messaging and batch-extract processes. All of these considerations are finalized at the conceptual level as the guiding principles to direct subsequent design work.

The majority of a data architect's responsibility lies in the analysis/design phase of the SDLC. In the organize component, the data architect completes deliverables that structure data relationships: data models, database designs, test plans for stress testing and performance testing, and the "ility" strategies.

The first three elements to complete should be obvious to most readers. However, then there are what some of us call the "ilities": scalability, reliability, maintainability and extensibility. These important application characteristics don't happen by themselves. During the design phase of systems building, the entire team needs to address these characteristics. The data architect is responsible for the characteristics that affect the databases as well as the application's ability to continue using the databases over time.

In the store component, the data architect completes deliverables that define how the data will be physically stored: data distribution design, partitioning/segmentation schemes or location transparency approaches.

In the access component, the data architect addresses techniques for delivering information to users, whether ad hoc users or application programs: domain definitions, data access approaches, indexing, stored procedures and triggers, and performance management approaches.

Why performance management? Admittedly, it's a debatable topic. As with the "ilities," I hold that it is the responsibility of the entire team because DBAs or technical support personnel alone cannot deliver high-performing applications. Databases and applications must be designed from the beginning with performance considerations in mind. Performance results need to be estimated for all critical and high-volume components during the design phase. The performance estimation techniques must be developed under the guidance of the data architect, who is in a unique position to understand the implications of various design approaches. In my experience, the data architect is frequently the lead voice insisting on the importance of early attention to performance, which is the reason for including it as a data architect responsibility. However, I repeat, high-performing applications cannot be delivered unless the entire development team addresses performance from the earliest days of design.

For the final component, move, the design phase delivers transformation rules and data conversion approaches.

Primarily tuning and traffic analysis remain for the data architect during the development phase. Traffic analysis is a systematic analysis of the data accesses of each program to determine the indexing requirements of the programs collectively. This allows indexes to be defined with consideration for the most critical accesses, rather than simply accommodating the most recent need, which can undermine existing programs.

I have described these components as if the data architect personally completes the work. Of course, this is not true. In many cases, developers are doing the work, but the data architect has a keen interest in ensuring that it is being done ­ and being done correctly. The data architect is often the advocate, persuading developers or technical support personnel of the importance of addressing these elements in a timely fashion.

There is also a wider view that needs to be considered because the components of the data architecture sit on a backbone of data management. Data management addresses questions of data ownership, security and data administration. That's beyond our scope for this column, but stay tuned.

Next month I'll address a question I'm often asked: Where does meta data sit in this data architecture framework?

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access