JUN 1, 2006 1:00am ET

Related Links

IBM Introduces Watson to Consumers in Service for USAA Clients
July 23, 2014
Majority of Organizations Claim ‘Advanced’ Data Environments, Practices
June 5, 2014
C-Suite Tweets about Profit Performance
June 4, 2014

Web Seminars

How Intelligent Digital Self-Service with Customer Analytics Can Lower Costs and Raise Revenue
Available On Demand
Improve Omni-channel Shopping Experience with Product Information Management
August 21, 2014

Poor Man's BI: Getting Started with Open Source Tools for Analytic Intelligence

Print
Reprints
Email

This is an article from the June 2006 issue of DM Review's Extended Edition. Click on this link for more information on DMR Extended Edition or to download this issue in a PDF format.

In his award-winning book The World is Flat, Thomas L. Friedman argues convincingly that open sourcing is one of 10 significant "flatteners" changing the world. Though much of that disruption is just now starting to play out in the market, it's safe to say that the open source software movement is no longer at the fringes of the technology. The successes of Linux, Apache, JBoss, MySQL, Postgres, OpenOffice, Mozilla and Perl, to name just a few products, have changed the landscape of IT, probably forever. While open source communities are thriving, businesses building on the open source model are just starting to hit their stride, struggling with strategies of how to make money surrounding "free" products. The successes in open source are expanding its scope, particularly as Fortune 500 software consumers and producers commit. Most open source technology use to date has been in commoditized areas, especially those serving ubiquitous infrastructure needs. But this is starting to change, and open source commercial business intelligence (OSBI) platforms are emerging. Still, the serious OSBI products are just now coming to market, and the business models driving the product companies are works in progress. While these companies, platforms and business models evolve, there is no need for patience from companies seeking BI answers. "Poor man's BI," the integration of open source stalwarts Python, PostgreSQL, OpenOffice and R, can be used to provide a meaningful startup BI solution now.

Consider a minimal architecture for BI, an architecture that, while not complete, can certainly deliver valuable startup intelligence to an enterprise. Data integration consolidates data from disparate sources to a common warehouse environment through myriad processes, including ETL (extract, transform and load). The data warehouse is the repository of intelligence data, functioning as the foundation of all inquiry - serving the marts and "convenience stores" that source reporting and analytics. Query and reporting accesses the data warehouse and marts using a consistent interface that (hopefully) insulates users from much of the complexity of the database language. And analytics brings a more sophisticated level of statistical-like methods and procedures, including exploration, estimation, inference, predictive models, time-series and multivariate, to the solution of business problems.

I call it poor man's BI, but the combination of the open source language Python, database PostgreSQL, desktop OpenOffice and analytics package R can deliver value along significant points of the BI lifecycle today and can indeed function as a BI proof-of-concept platform while the market sorts out the disparate new players and models. Poor man's BI is capable for data integration and ETL (Python), data warehousing (PostgreSQL), query and reporting (OpenOffice) and analytics/graphics (R). One thing is certain: Python, PostgreSQL, OpenOffice and R are products with established open source communities that will not disappear.

The Python/PostgreSQL/OpenOffice/R BI platform presents a maturity that will be comforting to users of poor man's BI. Like Linux, each product has reaped the benefits of especially strong early design and deployment leadership, either from an individual, a commercial venture or academia. Each tool is now supported by an enthusiastic community of developers and users. Each product is both stable - having survived the rigors of years of development and testing - and evolving to add significant new capabilities. A strong indicator of this maturity is the ease of product installation, both on UNIX/Linux and Windows. Each product can now be readily installed from either binary or source code on Windows, Linux or UNIX. Compare that to the old days of source code, make files and ... prayer.

Python

The cornerstone of poor man's BI is the power of Python, a language developed by Guido van Rossum and named after Monty Python's Flying Circus. Python is similar in functionality to Perl, but was designed as object oriented from the ground up. Like Perl and later Ruby, open source Python has grown to enjoy enormous success, with an estimated 1,000,000 users today. Python differs from C and Java languages, which take low-level instructions and require programmers to closely manage computer resources. For its role in BI, the three most important features of Python are its power and breadth of capabilities, its ease of use/productivity and its expandability/scale as an open source model. Some refer to Python (and Perl/Ruby) as a dynamic scripting language. Python can indeed produce very succinct and powerful scripts, and is dynamic in that variables are not declared and the traditional compile/link/execute cycle reduces to one-step compile/execute, but the moniker "agile" is probably more appropriate. Python is general purpose and readily addresses a variety of problems - text, file, systems, database, GUI and Web management - and makes programming simple and fun. Python is designed to handle very small tasks but can also be used for large-scale programming. To borrow a phrase, Python makes the routine simple and the difficult possible.

As important as the foundation capabilities and ease of use are to the Python community, it is the scale of the open source model that drives much of Python's success. Programmers worldwide can develop modules to share with the Python community, earning reputation and status for their good work. The volume of Python modules available for download is impressive; many have special applicability to BI. Some are products of research inquiry from staff at top universities and research centers. On more than one occasion, I have started down the path of new development only to find that I was reinventing code already published in open source. And, despite the cries of quality risk from some, I've found the modules I've downloaded generally to be of high quality. Democratization of the open source model, where the community can add to an established, high-quality foundation, is Python's biggest boon for poor man's BI.

Get access to this article and thousands more...

All Information Management articles are archived after 7 days. REGISTER NOW for unlimited access to all recently archived articles, as well as thousands of searchable stories. Registered Members also gain access to:

  • Full access to information-management.com including all searchable archived content
  • Exclusive E-Newsletters delivering the latest headlines to your inbox
  • Access to White Papers, Web Seminars, and Blog Discussions
  • Discounts to upcoming conferences & events
  • Uninterrupted access to all sponsored content, and MORE!

Already Registered?

Advertisement

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.
Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.