Back in the mid 70s, Saturday Night Live (SNL) arrived on late night TV with a radically different approach to live entertainment. The first few seasons represented a true experiment in network television, departing from the cookie-cutter prime time variety shows of the era and introducing a new brand of raw, energetic - and irreverent - comedy. Political and social satire with unapologetic impersonations created a new genre of stars. The venue was New York, not Hollywood. The "Not Ready for Prime-Time Players" cast was mostly unknown comedians. The band appeared in blue jeans, not tuxedos. Fast forward 30 years and SNL is seen as a permanent fixture on late night TV - a reliable, consistent source of, and a well-worn springboard to, fame and fortune for its cast members. Are there lessons to be learned from this in the BI technology world?
In the past few years, a similar upstart has emerged to confront the mature, staid proprietary software market. Open source (OS) technology has demonstrated the same experimental success as SNL, finding a bold way to create and deliver high-quality software to the contemporary commercial enterprise. The OS model has been a disruptive experiment in the software marketplace, diverging radically from the commercial paradigm. The traditional value propositions of proprietary software vendors are under attack, with volunteer-based, coordinated, global development communities supplanting small, corporate - owned enclaves of highly paid developers. New OS players are emerging, and software powers such as Microsoft, IBM and Oracle are making serious accommodations to the growing threat. Many of the early "unknown" developers of the open source world have utilized this growing platform to achieve technical celebrity and professional fortune as a result of their early contributions.
Linus Torvalds' Linux operating system was the bellwether of early OS success, establishing a user beachhead and a credibility that could not be dampened, despite both frontal and rear assaults from existing market powers. Quickly following were a torrent of OS alternatives to established proprietary offerings, each building on the next, progressing up the software hierarchy. The Apache Web server, MySQL database and Perl/Python scripting languages have combined with Linux to create the "LAMP" stack for Web applications, which has a significant presence in the marketplace and has been adopted and touted by some of today's most important technology-savvy companies. Now the OS phenomena has moved beyond these core infrastructure building blocks and offers compelling options for critical BI components that include ETL (extract, transform and load), reporting, OLAP and statistical modeling/graphics (see Figure 1). The alternatives in each of these categories merit exploration and consideration by today's value-minded CIO.

Figure 1: Open Source Alternatives for Proprietary BI Components
MySQL and PostgreSQL are the leading OSBI database technologies. Each has been used for years by commercial enterprises and leverages worldwide development and support communities. Both provide most, if not all, of the basic functionality expected of a relational database management system (RDBMS) for BI. A common fear, uncertainty and doubt (FUD) argument raised against OS databases is scalability; however the OS community has been quick to respond. With its 8.1 release, PostgreSQL now supports relevant BI features that include dynamic bitmap index scans, table partitioning and high-performance bulk loading. Moreover, recent surveys suggest that the majority of data warehouses support well below 1TB, perhaps mitigating scalability concerns for many users.
Kettle (now Pentaho Data Integration after its recent inclusion in the Pentaho project) is an intriguing option for ETL technology. A repository-based, GUI-driven ETL development and deployment technology, Kettle supports all core features expected from an ETL tool. Given its programming paradigm, anyone who has used Informatica or Oracle Warehouse Builder can quickly learn Kettle. More than 40 prebuilt mapping objects that can be combined in modular ways to create complex transformations are provided. Further, JavaScript integration is available for custom-developed mappings. Kettle is able to source from and write to leading RDBMSs and a wide variety of flat file types including Excel, csv, XML and fixed format. Plug-ins are available that enable connectivity to SAP. Kettle also provides the ability to create and execute jobs that sequence transformation execution and catch/respond to processing errors. When compared to commercial competitors, Kettle emerges as a highly functional ETL solution.
There are several capable OS report-development options, most notably BIRT, JasperReport and JFreeReport, that provide similar functionality to proprietary cousins such as Crystal Reports and Actuate. Each connects to leading commercial and OS databases, provides a GUI-based editor with wizards and enables report bursting. One of the special characteristics the OS tools share that proprietary vendors have been slow to adopt is an obsession with openness. These reporting technologies were built to be integrated within Java and J2EE frameworks and thus may be better suited for embedded, operational reporting needs. Indeed, OSBI's current strength directly addresses the growing demand for flexible BI application development in support of performance and process management initiatives.
An area of OS reporting functionality that is still relatively immature is the semantic-based, ad hoc query tool capabilities similar to BusinessObjects with its Universe construct. The reporting options listed above enable rapid development of highly parameterized and modular managed query applications but lack a strong semantic layer with the dynamic SQL generat ion required to support completely ad hoc end-user query functionality. There are solutions for this omission, however. Today, OS-based, user-friendly, ROLAP solutions are readily available. Mondrian, now also the analysis component of the Pentaho platform, is the leading OS ROLAP server while JPivot is a popular user interface that together support access to all leading proprietary and OS databases for source data, in addition to consuming both MDX and JOLAP queries and enabling easy-to-use drilling and slice-and-dice functionality familiar to OLAP users.














Be the first to comment on this post using the section below.