Agile development, lean approaches and NoSQL technology has re-invented data modeling, which in turn is radically altering software development.

Today’s data modeling is vastly different from the decades-old approach, which concerned relational databases in a waterfall development context. Schema design for NoSQL has become more of an art than the traditional data modeling science used to normalize SQL databases, and fully embracing it requires something of a mind-shift.

Agile development adepts rightfully feel that the traditional process of “conceptual -> logical -> physical data modeling” is too linear to fit the fast cycle of sprints, particularly given the difficulty of modifying a relational database structure. Formerly an intensive process accomplished early on by a separate group of specialists, data modeling has become a continuous, iterative, and collaborative process, centered on a query-driven approach that forces data modelers, designers, and developers to forget the guiding principles of the past.

Fotolia

Data modeling now plays a continuous role throughout the application development lifecycle—including long after the initial roll out. And since agile team members are multi-skilled and wear many hats, schema design has become a shared task that requires a new generation of tools.

More critical
As it turns out, data modeling is even more critical to the success of NoSQL-based applications than it has been for the relational world.

Previously, organizations primarily stored structured data in relational databases, but the advent of big data has changed all that. Businesses must now interpret massive tomes of structured, semi-structured and unstructured data in different formats from all kinds of sources. This is one big reason for the newfound popularity of NoSQL, which is known for its flexibility and scalability.

There is also the drive to develop quick iterations of minimally viable software products, as developers aim to deliver applications to customers much earlier in the development cycle, and then continuously fine-tuning of the offering along the way.

But while development technology and methodology have been revolutionized, some traditional best practices remain relevant. Documenting the application’s purpose, requirements, workflows and screen mockups still leads to better quality software, greater customer satisfaction and a lower total cost of ownership (TCO).

Another constant: Underlying each application there is still a database that supports the system and the business. Data modeling describes the business and the structure of the application and the database, regardless if it is based on SQL or NoSQL. And although the application’s blueprint can change during development, data modeling facilitates collaboration among the architect, designers, developers and end users at all stages of the life cycle.

Best practices
Data modeling should take place at each step throughout the application lifecycle, including domain-driven design or conceptual modeling; application design or physical data modeling; iterative development, deployment and optimization.

Agile team members perform the modeling in small chunks, and then work to develop and implement the model. But given the autonomy of these smaller teams, it becomes essential to capture requirements such as data privacy and regulatory compliance. Thanks to a Command-Line Interface that infers the database schema based on sampling large production datasets, data modeling can monitor the introduction of new fields and structures.

When moving to NoSQL, it would be silly to just copy a relational data structure. But to leverage NoSQL’s benefits, it is necessary to denormalize and join the data at the time of writing. This technique is more an art than science, as there are no definite rules for denormalizing. Instead, there are many different patterns for accomplishing the same thing: storing the data in a way that optimizes performance. The goal of NoSQL data modeling is to identify all the ways that the data will be retrieved, design the storage structure accordingly and evaluate different what-if scenarios.

Complicating matters further, there is no set standard for NoSQL, and each database vendor has its own terminology, data types, approach to storage, way of defining its primary keys and even its own query language.

Domain-Driven Design
A Domain-Driven Design (DDD) approach is closer to traditional conceptual modeling and provides a ubiquitous language for the application, the database, the developers and all of the business users.

With DDD, even if they’re not fully detailed, flowcharts are developed to define the application workflows, and mockups are drawn up to describe application screens and reports. This process forces the developers to think through the user experience, the application’s functionality and the business rules that underlie it. A physical data model is derived from the queries necessary to serve the application logic, the user interface and the reports.

The full pallet of denormalization techniques and NoSQL design patterns are applied, based on the needs and constraints that have been identified. Using a Test-Driven Development (TDD) approach, tests for constraints and business rules are written before the application code is developed, and the data modeling tool generates large sets of model-compliant test data.

While agile development allows for details to be added and changed at later stages in the project, the sooner the application’s foundations are defined, the less rework will be needed. All of this supports better product quality, a shorter time-to-market and a lower TCO.

The physical data model and application code take into account specific characteristics of the entire technology stack, including the NoSQL database. As the application functionality starts to mature, potential operational, scalability and performance issues also need to be considered, and this often leads to further adjustments to the data model as well as the application code. Documenting these changes to the model can be very helpful for all the business users, database administrators, data governance and privacy officers who have a stake in the new application.

Although the starting point (business requirements) and the end point (a physical data model) are similar to those employed by the traditional Conceptual-Logical-Physical development methodology, everything in between is executed differently. Most significantly, the flexibility of NoSQL allows for rapid changes to the model, but this requires a next-generation data modeling tool that supports the new technology.

As application development moves towards an MVP/fail-fast approach, and software development shifts towards an Agile/Continuous Integration/DevOps approach, data modeling needs to be done more often, in smaller chunks, by more people and over a longer time period--including well after the application has gone into production.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access