Since such driving forces are inevitable, what does the business do with the existing data? Existing data should not be scrapped or forgotten, because this data was used for several years to define the very existence of the same business. Instead the information must be massaged and tailored for the new system, thereby safeguarding the history and linking with the new or enhanced system.
However, the massaging and tailoring of this massive amount of data and propagating it to the new system is not so straightforward. Rather, it leads to the whole new world of data migration.
Let us look at a few real-life scenarios to understand the complexity and enormity of challenges in data migration:
- Database schemas are going to be different, business entities change to portray different functional meaning, and format and usage of data captured in a new system can be totally different.
- Data field lengths might change and pose severe data integrity issues.
- Other trouble points:
What is the size of the historical data?
How many source systems are involved?
How much processing power is available in the existing system?
Is any of the system's CPU and memory expandable?
Are there any production applications that may conflict with the migration?
What is the available network throughput?
What is the network bandwidth utilization? - Peak hours/off-peak hours
Fortunately enough, through the use of best practices, technology-driven focus and domain experience, the task of data migration does not have to be such a challenging issue. The process of migrating data can be broken down into a series of well-defined atomic level tasks, control metrics and procedures that reduce both cost and time to completion.
Data Migration Approach
There are a number of considerations and well-defined phases to execute a data migration project.
Phase 1 - Data Migration Planning. Develop migration strategy and approach, define scope, schedule, resource plan, technical requirements and detailed execution plan.
Phase 2 - Analysis and Design. Develop migration routines, validate business requirements for historical data, data analysis (profiler), mappings, referential integrity and certification scripts.
Phase 3 - Mock Migration. Conduct dress rehearsals for each planned release. Mock migrations may be partial or complete end-to-end cycles to verify migration procedures and benchmark the cycle times for each migration task.
Phase 4 - Pilot Migration. Complete end-to-end migration in the pilot environment. Coordinate with business users in doing data validation, verify and evaluate the control mechanism and metrics.
Phase 5 - Live Migration. Execute full-scale migration into production environment.
Phase 6 - Post-Migration Activities
Typical deliverables for the defined phases include:
- Data Migration Approach and Road Map
- Data Source Documents
- Infrastructure Planning and Metrics
- Technical Design Documents
- Failure Routines
- FMEA Document - Failure Mode Execution and Analysis
- Migration Status - Dashboards
- Data Migration Metrics and Control Charts
Diagnostics on the current environment on the following parameters also should be gathered:
- How much data will be moving from point to point (server to server)?
- How much processing power is available at each point covering both peak and off-peak hours?
- What is the estimation of the amount of transformation and cleansing needed?
- What are the data profiling and data validation rules/phases applicable to the data?
Data Migration Phases
A data migration project also has defined phases.
Figure 1 depicts the six phases of a data migration project. The phases may happen concurrently or in an iterative fashion. Entry and exit criteria should be defined for each phase and milestones should be set to trigger auditing, reviews as well as stakeholder expectation and communication processes.
Phase 1 - Data Assessment
- Identify data sources
- Run system extracts and queries
- Conduct user interviews and awareness programs on data migration process
- Review migration scope and validation strategy
- Create work plan and milestone dates
Key Participating Groups
- Data migration leads
- Business users
- Program sponsors
- Migration scope document
- Migration validation strategy document
- Work plan with milestone dates
Phase 2 - Data Cleansing
- Identify data cleansing needs and expectations
- Create data prep worksheets
- Clean up source data in current system
- Format unstructured data in other systems
- Run extracts and queries to determine data quality
- Create metrics to capture data volume, peak hours and off-peak hours
Key Participating Groups
- Data migration team
- Client IS team
- Modified source data that increases the success of automated data conversion
- Control metrics and dashboards
Phase 3 - Test Extract and Load
- Create/verify data element mappings
- Run data extracts from current system(s)
- Create tables, scripts, jobs to automate the extraction
- Address additional data clean-up issues
- Execute application specific customizations
- Run mock migrations
- Load extracts into the new system using ETL tools or SQL loader with bulk loading functions
- Conduct internal data validation checks including business rules and referential integrity checks
- Report exceptions to client team
- Perform data validation