With the emergence and maturity of agile development, lean approaches, and NoSQL technology, the field of “data modeling” has re-invented itself to play a much more significant role in the development of software products and projects.
But if the “data modeling” name has remained the same, it is now vastly different -- in terms of lifecycle, people, process and tools -- than the decades-old approach which concerned relational databases in a waterfall development context. Schema design for NoSQL has become more of an art than the traditional science of data modeling for normalized SQL databases, and a mind-shift is essential to fully leverage the benefits of NoSQL technology and agile development.
Adepts of agile development have rightfully felt that the traditional process of “conceptual -> logical -> physical data modeling” was too linear and sequential to fit in the fast cycle of sprints, particularly given the difficulty to modify the structure of relational databases. It used to be a heavy process accomplished early on by specialized and often dedicated resources, before passing off the output to developers. Nowadays, data modeling has become a continuous, iterative, and collaborative process distributed among more team players. It is centered on a query-driven approach that forces data modelers, designers, and developers to forget the guiding principles of the past.
No longer does data modeling only help during the early phases of a project, it now holds a continuous role throughout the application development lifecycle – providing value every step of the way, including in production, long after the initial delivery of the application. And since agile team members are multi-skilled and wear many hats, schema design becomes a shared task and responsibility that requires a new generation of tools adapted to both the technology and the development practice.
The benefits of data modeling are plentiful and quite quantifiable. As it turns out, data modeling for NoSQL is even more critical to the success of NoSQL-based applications than it has been for decades in the relational world.
Needs, technology, and methodology shifts
Previously, organizations were storing mostly structured data in relational databases, but the needs have changed dramatically with the emergence of Big Data and the advancement of database technology.
As we know, organizations now have a proliferation of structured, semi-structured, and unstructured data coming to them in different formats from all kinds of sources. This new dynamic is one big reason for NoSQL which is known for its flexibility and scalability.
The application development world has also been undergoing major changes as it is now about minimum viable products and quick iterations, as developers aim to deliver value to customers much earlier, with quicker payback, and with constant fine-tuning of the product and business model along the way.
While revolutions have taken place on the technology and methodology fronts, several best practices and guiding principles remain relevant. It still greatly reduces Total Cost of Ownership (TCO) and enhances quality and customer satisfaction to document a clear vision for the application; to detail requirements, workflows and screen mockups; and to discover conceptual issues and defects early on in the application development lifecycle.
Also, what hasn’t changed is that under each application is a database to support the system and business. Data modeling describes the understanding of the business and the structure of the application and database (no matter whether SQL or NoSQL). And although the blueprint can change during development, it’s imperative to still have data modeling in place to facilitate the communication and collaboration between the architect, designers, developers, and end users before, during, and after design.
Data modeling is now a continuous process through every step of the application lifecycle and in production, as data governance and privacy regulations dictate that compliance is documented and maintained - even for semi-structured and unstructured data - through constant evolutions of application features and underlying data.
Data modeling occurs in the different steps in the lifecycle, including during domain-driven design (or conceptual modeling); application design with physical data modeling; development with iterative evolution; testing with the generation of relevant test data; deployment and optimization; production while maintaining data governance and compliance; and continuous enhancements with documentation and facilitation for next versions.
With agile development, team members are multi-skilled and cross functional; and one of those essential skills is data modeling. Agile team members now perform data modeling in small chunks, and then work to develop and implement that model. But with the autonomy of smaller teams, it becomes essential to capture, after the fact, potential impacts such as privacy or compliance requirements. Thanks to a Command-Line Interface inferring the database schema based on sampling of large production datasets, data modeling can monitor the introduction of new fields and structures by agile developers.
When moving to NoSQL, it would be silly to just copy a relational data structure. To leverage the benefits of NoSQL, it is necessary to denormalize and join data at the time of writing. This technique is more an art than a science, as there are no definite rules for how one should denormalize. Instead, there are many different patterns for denormalizing data with only one goal in mind: store the data in a way that will best serve the performance when it matters most: when users need access to the data. That means pre-aggregating or joining data “on-write.”
The goal of data modeling for NoSQL is therefore to identify all the ways that the data will be retrieved, design the storage structure accordingly, and evaluate different what-if scenarios.
To make matters a bit more difficult, NoSQL is not guided by a universal standard, and every database vendor has its own terminology, storage approach, way of defining primary keys, data types, and even its own query language.
Based on a global vision and strategy for the application, and functional requirements for the early phases, a Domain-Driven Design (DDD) approach reaches similar goals as more traditional conceptual modeling, with some differences in DDD’s focus on the problem, rather than the solution. It is also interested in behaviors. The output is a bounded context, including a ubiquitous language at all levels of business users, developers, and the underlying application and database.
Even if not detailed all the way through, flowcharts are developed to define application workflows, and mockups are drawn up to describe application screens and reports. The formalization of flowcharts and mockups brings issues and challenges to the surface. It forces team members to think through the user experience, the functionality, and the business rules. As team discussions progress, the domain and language get fine-tuned, and so do the flowcharts and mockups.
A physical data model gets derived from the queries necessary to serve the application logic, the User Interface, and the reports. The full pallet of denormalization techniques and NoSQL design patterns are applied according to the identified needs and constraints. In a Test-Driven Development (TDD) approach, tests for the constraints and business rules are written even before application code is developed to pass the tests, and the data modeling tool can generate large sets of model-compliant test data.
From then on, the following artifacts will continually get updated and synched: domain model, flowcharts, mockups, architecture design, data model, and ideally test plan. These artifacts are catalysts for regular discussions and collaboration between team members. Ideally, these debates, discussions, and fine-tuning of the artifacts take place early on in each cycle -- BEFORE any line of code gets written.
While agile development allows for details to be defined at later stages in the project, the sooner sound foundations get defined, the less rework will be necessary down the road. All of this is in the interest of quality deliverables, time-to-market, and TCO. Pretty soon, enough of the foundation is clarified to see patterns emerge to allow the creation of a prototype based on a first version of the data model and application code.
All application stakeholders collaborate to ensure relevance of the application being built. The physical data model and application code will take into account the specific aspects of the entire technology stack, including the NoSQL database. As the application functionality starts to mature, concerns about operations, scalability, and performance need to be considered, often leading to adjustments to the data model and the code.
The data model gets validated according to the specific NoSQL database vendor’s capabilities, and optimized to leverage those as well. This includes storage approach, caching, indexing, querying capabilities, and sharding concerns.
The data model documentation becomes a living artifact in production, helpful to business users, database administrators, data governance, and privacy officers. When application enhancements are imagined, impacts on the data model need to be studied and thought through along with the other project artifacts described above, before code gets developed.
While the starting point (business requirements) and the end point (a physical data model) are similar to those in the traditional Conceptual-Logical-Physical methodology, everything in between is executed differently. As the flexibility of NoSQL allows for rapid changes, the entire data modeling methodology needs to be rethought to suit the new technology and development approach. A next-generation data modeling tool is required in support of the new technology and methodology.
In the end, product development has changed towards a more MVP/fail-fast approach, and software development has changed towards an Agile/Continuous Integration/DevOps approach. Data modeling becomes an exercise done more often, in smaller chunks, by more people, and through a longer period of time, including well after the application has gone into production.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access