Most firms possess a reasonable balance between internally developed applications and vendor systems, but occasionally you encounter a firm where they are extreme on one side. The focus here is the set of unique challenges facing firms with the majority of their systems outsourced. What observations can we reasonably make (in comparison to firms favoring internal development)?
- Change requests (regarding data feeds) are often developed and managed by the application vendors (and are prioritized and costed with other requests).
- IT staff is usually more geared toward managing coordination and support with the vendors (as opposed to serious application development).
- There is less-than-complete understanding of what happens to the data traveling between and transformed within the vendor systems.
- There are more variations of the same data, such as different names and formats of the same entities in different systems – i.e., many versions of the same customer.
The major IT work becomes gluing the systems together – the data starts in one system and usually travels to other systems, where it is enriched/analyzed and then travels again. The lowest common denominator is still CSV or XML files. Files fly around, usually on a batch schedule, either via FTP or dropped on a shared folder for pickup by the destination system.
Any reasonably sized organization is going to have a few major systems and a plethora of suppliers and consumers connected to them. The cumulative ETL code can turn into a monster to manage. Even for firms lucky enough to agree and standardize on one ETL tool, it becomes the “internal” application around which much of the IT work happens. [(But if the ETL development and support itself is outsourced, then one can characterize it as a vendor app).
ETL platforms are certainly maturing. Many come with graphical tools, rich libraries of functions and full-blown development languages. Sure, the ETL logic is generally centralized and organized, when you use a tool. But these platforms are expensive, and as I have observed, require just as much time to develop a solution as good old Java or .NET. To be fair, these ETL tools are particularly good at providing infrastructure for different messaging methods. But, some vendors upgrade their systems as frequently as quarterly, necessitating a review or enhancement to the data formats of the files. An upgrade to the ETL logic requires the usual requirements, design, code, test and approval if you are following the waterfall approach. Change requests compete with other priorities and usually don’t make their way into production for quite some time. As a result, business users are tempted to apply a workaround to deal with the deficiencies. This scenario plays itself over and over as the upgrade and compatibility cycle continues. And don’t discount that with the power of commercial ETL packages comes the need for experienced (expensive) practitioners.
One might resist the metadata solution I am about to describe because “once you set up interfaces, they never change.” In a perfect world, I’d agree, but in my experience, tweaks are common, whether it is a system upgrade, a missed requirement, an enhancement the business wants or even defects that made it through to production.
So, if constant change is a given (just reflect on how often business requirements changed in your last project, even at the last minute), why wouldn’t we want to try a different approach? The remainder of this article describes how a firm, choosing to heavily outsource its systems, can deal with the challenges noted.
If we were faced with this situation, ideally, what we would we want?
- To insulate ourselves from version compatibility,
- Be able to respond to and implement change requests quickly, as soon as possible,
- Not require specialized or hard-to-find expertise,
- Not spend the time or money to install a large ETL platform, and
- To understand our own data better.
The solution I propose is based on a metadata repository. The repository holds information about the entities involved in a data feed. They are: applications (i.e., vendor systems), data elements (fields), interfaces (source application plus destination application), element maps (one or more data elements from the source system connected to one or more data elements from the destination system) and value maps (value translation from the source to destination). Along with this, we have a small app that processes the metadata – call it a metadata engine. There are a few more pieces we’ll need, but we’ll cover that later. By breaking these out and turning them into data, we are starting to replace what is normally ETL language code into data.
At a high level, the metadata processor runs through all the element maps to transform a source file into another format (or into a SQL table). The transformation “language” is SQL, something most developers and many business systems analysts know. So, a simple transformation can draw upon SQL’s rich set of functions like dest_field=substring(src_fld,2,3). Although not mandatory, we may wish to put a graphical user interface on top of the repository, especially if non-IT types will be supporting/managing the metadata. Ideally, you’d want a data management group or even a technical data steward to have access and get comfortable enough to make straightforward changes, without getting into the IT work queue. But the solution can work just as well without a GUI - in this case, you’d need IT to perform updates as required.