The critical success factor for all these initiatives is the clear and precise definition of the functional requirements, defining the behavior of the system, and the nonfunctional requirements, describing how the software will do it and what it will not do.
The business users define the functional requirements substantially well but are challenged when detailing nonfunctional requirements. Asking business the right questions can achieve a better business-IT alignment.
Nonfunctional Requirements – Criticality and Lifecycle
In any software development effort, gathering requirements is the first step. Functional requirements describe the behaviors (functions or services) of the system that support user goals, tasks or activities, and nonfunctional requirements include constraints and qualities. Like functional requirements are tracked to closure during the various phases of the software development lifecycle, it is imperative to elicit, document and track NFRs to closure for the success of any software development project.
An NFR lifecycle usually proceeds as follows:
Identify NFRs applicable to an application situation:
- Identify the key/critical NFRs for the given engagement.
- Understand the significance of the critical NFRs; assess the impact if a given requirement is not met.
Define what is required by preparing a questionnaire for NFR stakeholders:
- Categorize NFRs into “should have,” “could have” and “nice to have.”
- Frame questions for qualitiative and quantitative NFRs.
- Validate the practicality of the NFRs.
Decide on potentially conflicting NFRs based on constraints:
- Establish a cost-benefit analysis of building a system complying to NFRs versus foregoing few aspects of NFRs.
Track NFRs captured from requirement to testing:
- Validate architecture, design and coding practices at various software development lifecycle phases.
- Create adequate test plans to validate that the end solution passes all NFRs stated/agreed upon.
Close/Execute test cases:
- Collect and document metrics to confirm compliance to NFRs.
Data Integration Initiatives and NFRs
In a typical data integration initiative, it is not uncommon that functional requirements may not be completely defined early in the project lifecycle; they get clarified as the project progresses. The clarity on nonfunctional aspects is much lower. NFRs are stated informally during the requirements gathering, are often contradictory, difficult to enforce during software development and hard to validate when the software system is ready for delivery.
Nonfunctional requirements seem less clear because:
- There’s a lack of understanding about what to include as nonfunctional requirements;
- No one is sure what should be specified, this results in exclusion or specification of unrealistic nonfunctional requirements; and
- People assume it is implicit.
NFRs Critical to a Data Integration Initiative
Nonfunctional requirements are as critical as functional requirements to the success of a data integration initiative. It is imperative that the right set of questions are asked during the requirements phase and that each of the nonfunctional requirements is tracked during the complete lifecycle of the project.
Performance is about the resources used to service a request and how quickly an operation can be complete, e.g., response time, number of events processed per second. In the context of a batch application, it may be insignificant, as things are not usually measured on a per second basis, but it is still important as SLAs to support downstream systems or applications. The key questions that could be used to help document performance requirements are:
What is the batch window available for completion of complete the batch cycle? Knowing the batch window can help the project team validate if the SLA can be met with the given input volume, process complexity and resources.
What is the batch window available for completion of individual jobs? There are scenarios where the in-process data is consumed by some of the downstream systems. Availability of this information can help define the batch dependency and design the batch more efficiently.
What is the frequency of the batch? The frequency may be different for different jobs. Not all days of the week will have the same incoming volume. The hardware on which the batch cycle runs could have varying loads based on the day of the week. A clear understanding of the frequency will help in tuning the batch based on incoming volume and expected resource availability.
What are the data availability SLAs provided by the source systems of the batch load? Detailed information of the availability service level agreements can help validate if the batch cycle can meet the required SLAs or not.
What is the expected load (peak, off-peak, average)? This key information needs to be obtained for all source systems, and the details gathered should include when data volume is expected to be high or low.
Are there any constraints imposed by the source system? The typical types of constraints are related to a time window when data can be pulled (or not), adding any additional filters while querying the data from a source system (because this may over load the source system).
What is the hardware and software configuration of the environment where the batch cycle needs to run? This is a very critical input for the design process, because this information can be used in suitably leveraging the capabilities of the environment. Examples could be in leveraging features of a version of software and leveraging hardware by multi-threading
Are the resources available exclusive or shared? If shared, what percent is available? If the deployment is in a shared environment, the availability of resources can be a useful input to determine if the batch SLAs can be met.
Scalability is about how resource consumption increases when more requests are to be serviced and how well the application can be scaled up to handle greater usage demands (e.g., number of input files, number of users, request rates, volume of data, etc.). It is important to answer the following questions:
What is the expected annual growth in volume (for each of the source systems) in the next five years? This information is useful in validating if the batch cycle can meet SLAs in years ahead.
What is the projected increase in number of source systems? If the number of source systems providing data to the batch cycle is likely to increase, this knowledge can be used to build the batch process to have flexibility to accommodate the same loads with minimal change.
How many jobs to be executed in parallel? There is a difference between executing jobs sequentially and doing so in parallel. The answer to this question impacts performance and could affect the degree to which the given application can scale.
Is there any need for distributed processing? If yes, details around distributed processing can also impact the expectations around scalability. If the data integration batch process is done centrally, it will have fewer complexities compared to when done in a distributed environment because synchronization between instances could potentially become critical.
Reliability is the confidence that the system (and processes) work correctly with the given set of inputs. Reliability is also assessed by how quickly a process can be brought in to deliver the expected output if the process fails or aborts due to any reason. Important questions include:
What is the tolerance level or percentage of erroneous data that is permitted? This helps in setting the threshold and error handling and the strategy for restarting versus aborting a process.
Is there any manual process defined to handle potential failure cases? This helps in deciding what needs to be taken care of at the system level and what needs to be taken care of manually.
Under what conditions is it permissible for the system/process to not be completely accurate, and what is the frequency/probability of such recurrence? This helps in deciding the boundary conditions for the system and to what extent precautions need to be taken in the system as well as what needs to be addressed outside the system.
Maintainability is a measure of how easy it is to correct defects in the software or make any changes to any given piece of code. Consider:
What is the maintenance/release cycle? The general practices of release cycles in the environment can be an indicator of the rate of new enhancements.
How frequently do source or target structures change? What is the probability of such change? The degree to which software can be configured affects other nonfunctional aspects, like performance. There are many standard aspects of the batch that can be configurable; there are other aspects that need extra effort and are increasingly complex. It is critical that aspects of the code be made configurable depending on the frequency and probability of change.
Extensibility can be interpreted as the ease with which the given capability of the system can be extended. This will save the company time and money in future projects. It is important to assess the following points:
How many different/divergent sources (different source formats) are expected to be supported? The answer to this question will determine if there is any merit in going for source-side formatting or, alternatively, whether to use design patterns with built-in extensibility.
What kind of enhancements/changes to source formats are expected to come in? This information is used to understand if there is any need to define abstract transformations or reusable mappings.
What is the probability of getting new sources added? How frequently do the source formats change? How divergent will the new source formats be? This will again be used to determine if there is any need to define abstract transformations or reusable mappings.
Are there any special security requirements (such as data encryption or privacy) applicable? This information would be useful for understanding if data needs to be encrypted and stored and building processes for masking source data for performance testing of the batch cycle.
Are there any logging and auditing requirements? This information will be used to understand how much log maintenance is involved and to facilitate support personnel as well as to understand if there are any statutory or compliance requirements that need to be met by maintaining proper audit trails.
Is there an upper limit for CPU time or central processing systems, etc.
Are there any limitations on memory that can be consumed?
System availability is the time when the application must be available for use. When assessing availability, it is vital to keep time zones, schedules and users’ locations in mind.
- Do the target store/ database/ file need to be available 24X7?
To decide up to how many 9s after decimal one need to budgeted for
- Any down time is allowed?
To decide up to how many 9s after decimal one need to budget for
- Is there any peak or off peak hours during which loading can happen?
To decide hardware (RAM, CPU etc) or to allot any defined window for the batch cycle
- Are there crucial SLAs that need to be met?
To decide hardware (RAM, CPU etc) or to allot any defined window for the batch cycle and assign priority accordingly
- What if SLAs are missed are there any critical system/ business impact?
To plan the jobs properly
The Big Picture
Effectively gathering NFRs is a key success factor for all data integration initiatives. Understanding the types of NFRs and following a systematic approach for capturing them can help identify quantifiable and measurable NFRs.
NFRs can often be conflicting. It is critical that based on the requirements of the project, the most applicable NFRs be identified and tracked through the entire project lifecycle. While choosing NFRs we have to be conscious of the practical aspects of implementing the NFR and the resultant benefits. The framework detailed here can help you understand NFRs better and, hence, achieve better business IT alignment.
Sastry is a Principal Architect with the Banking and Capital Markets vertical at Infosys. He has around 15 years of technology consulting experience during which he was associated with many of the large Banks and Financial Institutions across the globe. He has been part of many small and large initiatives related to application development, architecture definition and Strategy definition. Sastrys focus area in the recent past has been Data Integration, Data Quality and Data Modeling.
Shantaram is a Senior Data Architect with Banking and Capital Markets vertical at Infosys. He has more than 15 years of technology consulting experience in General (Non-Life) Insurance, Retail Banking, Capital Markets and Energy & Utilities domains. He has led and implemented many Data warehouse projects and has strong knowledge of Data Modeling, Data Integration, Data Federation, Data Architecture and Data Warehousing architectures.