Wouldn't it be nice if business intelligence could generally distinguish common cause/market risk from skill components when assessing business performance?
Im developing a return on investment calculator for data warehousing appliances, using the Forrester Total Economic Impact methodology
2010 has already delivered a list of top acquisitions; drivers include maturity, customer acquisition and cross-selling, but the confusion now falls to buyers
IBM has indicated their focus on using Initiate as a lever into healthcare and government industries and clearly will need to rationalize their technology portfolio
The rumors of IBMs acquisition of MDM vendor Initiate Systems were confirmed in an IBM press release. Herewith, Jill pays homage to a class act.
You wouldnt know it following this weeks activity, but there are more than two MDM vendors in this ever-changing market. What does this massive disruption mean?
With CIA profilers reportedly moonlighting to monitor earnings calls for "deception detection," speech analytics vendors are likely licking their chops for what comes next
The year is underway and the optimism about using technology for business has a new decade mentality
Performance measurement is a key driver of business intelligence. I'd guess that most BI deployments today are at least in part supporting PM
Now, Informatica can advance the use of MDM within many areas of their business from data governance, cloud computing and business-to-business processes
I usually root for the little guy. So when I heard the rumor that there were multiple bidders for Siperian, I figured SAP would take it home and that it really was a three-vendor world
Some pros and cons of implementing a vendor provided analytical logical data model at the start of any business intelligence or data warehousing initiative
Look out IBM, Oracle and SAP youre about to lose a bit of your dominance in the master data management market to Informatica
Although I do not see a stampede to replace traditional BI applications with SaaS alternatives in the near future, BI SaaS does have a few legitimate use cases even today
Just like iTunes, granular Web data services are just beginning to sprout into a new transactional data economy
The market for information applications is heating up and the demand is coming from line-of-business and vertical industries
After writing two blogs on the delights of the agile language Ruby, I decided to eat my own dog food and challenge myself to convert a significant piece of my portfolio returns simulation program from the statistical package R to Ruby
Lets do a deep dive and discuss the current business and market environment and impact to IBM efforts in collaboration
Jill advocates behavior changes, starting at the top. But not for you. No, youre fine. Just for other people
Today's blog embellishes on Ruby's suitability for basic data management tasks that precede business intelligence and analytics
This industry activity validates an automated approach to discovering and matching data as part of organization's information management processes
We need to be more pragmatic about technology predictions and focus on what we need to do to improve in iterative time cycles
Deconstruct your daily tech briefing, find your self-interest and use last weeks HP/Microsoft news as a learning experience
The intelligence value of any individual tweet in isolation is negligible. Then again, whats the intrinsic value of a single attribute-value pair in a relational database?
I came across an article with the headline Mining the Cloud. The cynic in me immediately issued a silent scoff: How is that different from crawling the Web?
You never know whats coming at you next, which is why process agility is so important
Maybe IT should establish a functional team thats responsible for data packaging and distribution, just like the movie industry
Price-performance is everything in data warehousing (DW), and its become the leading battleground for competitive differentiation
Parents want to see their kids with a happy and successful career, but these days there's not a lot of useful advice
Dusting off a favorite open source tool, the programming language Ruby
The more complex business processes are the more difficult it is to find the magic blend that distinguishes a well-tuned process from a miserable flop
Lets see if 2010 is the year of the workforce and if Peopleclick can help organizations truly invest into the most valuable asset
Before we shift our focus completely to 2010, its worth a last look back to consider all that was learned in 2009
Silver Creek is the only data quality vendor at this point that we feel has sufficiently addressed the unique and complex challenges associated with product data quality
Your customers are spending more time and venting more of their true feelings in social networks, so it only makes sense to move customer-facing operations into this brave new world
Social networks have always been with us, of course, but now theyve gained concrete reality in the online fabric of modern life
Not that it will change overnight, but an annual budgeting cycle for IT spending might not stand the test of time
The core problem with todays CEP offerings is that many of them are power tools, not solutions suitable for the mass business market
In data warehousing, the most likely casualty of the Teens will be the very notion of a data warehouse
This searching-planning compromise lens now colors my assessment of business strategy and intelligence
Weta Digitals Paul Gunn runs a state of the art facility but doesnt play by all the new rules
I expect this new category of technology to be like business intelligence in the 90s, where the range of vendors and opportunity will with significant demand
My writing retreat is completed for 2009. I'm not sure whether I'm comforted or humbled
The unique element of Lombardi was their ability to help organizations gain visibility through their process models that can easily measure activities and business flows across the enterprise
Everyones talking about single version of the truth, but how often are our reports reviewed for accuracy?
Amazons 12/14 launch of Spot Instances for EC2 is something we might want to mark on our calendar as a minor historic date in computing
I recently downloaded the latest stock portfolio returns from the website of Dartmouth professor Ken French
It is usual (and customary), during the holiday season, to give thanks for the blessing we have. I thought a short IT list would be sufficient
Advancements in business applications for performance management do not come very often in the industry but Actuate has one that we should all pay notice
Here are some quick thoughts on the trends that will shape advanced analytics in the year to come
If you're going to tackle a new center of excellence or competency center, at some point you're going to have to throw out the analyst playbook
About five years ago, a Newsweek review of the then new iPod raised a concern that the shuffle feature might not be random
SAP made it clear that they would like to eliminate the three letter acronym soup of the past and focus on the line of business needs by vertical industry
Don't sweat data governance, make it a measure of political maturity to challenge your peers and look for tools to help your people processes
From an enterprise perspective, social networks are the buzz that can spell the difference between success and failure in a reputation-driven online economy
Sales opportunities are continuously updated but the impact of those changes to the sales forecast and overall financial health of the organization is not well understood
Because a truly randomized tournament is both impractical and unattractive, officials must grapple with quasi-experimental methods and designs that attempt to control bias, all within a limited budget
Yes, Virginia, there is a Santa Claus and we are rapidly finding that this is getting the information into the hands of those who can use it
The main challenge and opportunity for Plateau is to demonstrate the value of their integrated suite of applications and convince, not just HR and training professionals, but finance and operations management the value of adopting their approach
When it comes to bad data, a lot of the problem stems from companies letting their developers off the hook
A technology vendor named Splunk has come to market with a solution to index volumes of log files and provide search and display capabilities along with a new generation of interactivity and application capabilities
Beyond simple delivery of alerts, URLs or actual reports via email here are a few newer approaches to deliver business intelligence on a mobile device
I like the concept of discovery-driven planning a lot. It seems a reasonable compromise between pure planning and searching
It's probably happenstance that statistical juggernaut SAS has been in the news several times over the last week following my recent blog. But like me, the press hasn't exactly been favorable, adding credence to the claim that SAS is now facing serious threats on several competitive fronts
Business is all about placing bets and knowing if the odds are in your favor
Salesforce steps into the business and social collaboration market where most would not expect
Building a service-oriented architecture implies the understanding of the business and the use of patterns - because SOA is really about architecting for reuse
The point here is whether you choose to build or buy your MDM solution, know what you need, and what you dont need. I know that sounds obvious. But obviously, its not
Informatica 9 continues to evolve PowerCenter and PowerExchange data integration and access technologies into a much more comprehensive data management platform going well beyond the scope of traditional, batch-oriented ETL that remains Informaticas bread and butter
Gaining better consistency and quality of information is what IT needs to be addressing more than consolidating BI as the underlying ingredient as your organization's most valuable asset
With my inbox groaning under overload, two themes have emerged: some will watch the establishment evolve (Goldman Sachs has picked the winners already); and others will simply go for it
Business intelligence practitioners can analyze, critique and debunk the works of popular management gurus till the cows come home, but the demand for expert wisdom grows unabated
A key critical point to "what differentiates one business intelligence software as a service vendor from another" discussion is what really constitutes multitenant architecture. Here are some initial thoughts to stimulate the discussion
I picked these 20 next generation business intelligence trends, not because the market labels them as trendy, but because I have tons of evidence that the buy-side o f the BI market can use these technologies today
Start looking at advanced features to select a data visualization vendor/solution
Informatica has advanced their efforts with support for business and IT collaboration, pervasive data quality and SOA-based data services
It might have been software or an act of providence, but it appears that Windows 7 saved my job
Kronos introduced the next generation of workforce management that will dramatically improve the usability of its new releases next year
You know that you don't have an enterprise BI strategy if ...
Statistical and predictive analytics software is fast becoming a next big competitive landscape in the business intelligence market. Heightened competition will be a boon for statistical consumers of BI
The SAP enterprise performance management and governance, risk and compliance solution is a refreshing approach that organizations should review if they have not
I downloaded a 30 day trial of the World Programming System and started constructing simple scripts. It was like seeing an ex after eight years of separation
Information on demand depends on a growing base of content assets and services and the cut-to-chase script looks increasingly agnostic
IBM announced a new version of IBM Mashup Center that will simplify the assembly and publishing of information across a workforce and across the Internet for consumers, constituents and customers
SOA is not a technology but an architectural paradigm and it is on the move again
IBM largest challenge with their software investments will be to continue to simplify them and connect with those in line of business and ensure they can get smarter solutions in the right time frame at the right price
At IBM Information on Demand in Las Vegas, the conference started today to introduce the latest insight to the fastest growing components of their business with analytics
The business expects high quality data, but hasnt taken much accountability in delivering it
I revisited MITOpenCourseWare a few months ago in search of a gentle computer programming course I could recommend to those interested in BI without a strong technology background
With all the press releases emitted from OpenWorld, Larry Ellisons $10 million IBM challenge stands out
Teradata announced their advancement into cloud computing called Teradata Enterprise Analytics Cloud, providing a range of options for organizations to use the database technology
Since 2004, when the Red Sox rallied from three games down to beat the Yankees for the American League championship and swept the Cardinals in the World Series, the Red Sox have been New England's academic darlings, champions of the data-driven, predictive analytics approach to running a baseball team
In this years keynote Ellison briefly discussed Linux and their efforts to drive neutrality in operating systems though they will potentially own Sun Solaris and open source version of Solaris by their acquisition of Sun Microsystems
The growing glut of Web services will increasingly flow not through browsers, but through application programming interfaces
When it comes to planning and budgeting, the gap between whats possible and what companies actually do is still wide
At Oracle OpenWorld the focus on Oracle Complex Event Processing was quietly demonstrated in educational sessions how this dedicated technology is helping organizations
Better than the massive defeats by the San Francisco 49ers and Oakland Raider football games in the bay area on Sunday was the opening keynote at Oracle Open World. Silicon Valley legends Larry Ellison and Scott McNealy opened up the evening
The demand for new college grads is still soft. This cycle reminds me a lot of 2000, when the high-flying technology consulting career choice crashed and burned, no doubt mirroring the same fate of the Internet boom
To become truly future-focused, organizations must build out their predictive muscles through deepening commitment to these and other advanced analytics technologies
The information function in the organization is a network of the people who really know whats going on; Theyre not organized or rewarded, yet theyre running the damn place
It is not often that there are technologies that make you rethink your current methods of how you work and the tasks that you do constantly
For every operational system, a company can save hundreds or even thousands of hours every week in development and processing time
At this years HR Tech in Chicago a broad range of new software products for human resources professional to address their people-related processes across a range of activities were available for HR and associated IT professionals
The October 2009 Harvard Business Review spotlights Risk Management and Performance Measurement, both significant consumers of BI
It is not uncommon to expect with the downturn in the economy, that CIOs would be challenged to do more with less. IT executives are focusing on ensuring that business is conducted efficiently to get more mileage out of their budgets
As data and projects become collaborative, a metadata editor sounds like someone youd like to have, and maybe wouldn't have to hire
Be forewarned that as you read analyst reports on GRC that they are a small component of what you will need to address a range of needs in the enterprise and across business and IT
The other day, I came across an article in the Wall Street Journal noting that movie rental company Netflix had announced a winner of the $1 million contest to improve the accuracy of its film recommendation engine
It is up to SuccessFactors to see if they can make something of this new category and focus in business execution and what other vendors might join them
The six words above are misleading, Im really trying to get your attention to tell you about something about business intelligence, so cmon, click it
There is a need to define a standard quickly for data interchange with regards to the electronic health record and the electronic medical record
The majority of data issues within an operational system are data entry-related. The challenge in business analysis is to establish standard business processes to automate
Having IBM seriously promote an analytics product line will help all statistics/mining vendors and provide a much-desired commercial jolt for academically-focused R
I am, slowly but surely, beginning to believe there couldnt be a better case for demand for business intelligence software as a service -- especially after findings from a project that I am currently conducting
Experts say search, information extraction and text analytics can close the structured/unstructured divide; can it really be that easy?
The growing volumes of data from the Internet and enterprise placing pressure to gain better insights on a more frequent basis is an issue that continues to be for business and IT
I received a very nice note from Eric Siegel, PhD., Conference Chair, Predictive Analytics World last week in response to my recent blog that mentioned an upcoming PAW
It's an age old question: Which came first: the data or the process? Okay, not an age old question, but an interesting one to ponder nonetheless
Blue-sky challenges and benefits of health care data management will touch individuals and economies
Synygy's new set of capabilities will help organization address the sales and revenue performance priorities and drive new levels of operational efficiency in sales
This relationship long predates this announcement, but it could be the start of a relationship of considerable strategic importance to both partners
I've assembled my BI-Searchers toolkit
While Forrester does not have a formal description for a head of business intelligence, if I map requirements for BI best practices, heres what I come up with
Logically this acquisition expands Informatica beyond data integration into event integration
As a consultant Im often asked about how roles and responsibilities should be delegated or identified within the IT organization to support the data warehousing
It has been some time since I advanced the definition and focus of operational performance management over eight years ago
There's a randomness to company performance that, as deftly chronicled by Leonard Mlodinow in The Drunkard's Walk, is a much more powerful force than we'd like to believe
I believe the massive stimulus package signed by President Obama in early 2009 was a catalyst to a situation that was already fraught with issues the integration of information in health care
Now as an entire business needs to operate on the Internet the responsibility for business and IT executives leadership and involvement is essential for success
In any dynamic business environment, the last thing you want is to indulge in navel-gazing. If youre not adept at responding to breaking events or anticipating the future, you will find yourself marginalized in the new economy
We need to understand the variance between the data as it exists and its acceptability, not its perfection
The rapid development of technology and expansion of the Web provide the infrastructure for promoting ever-more-rapid innovation. But what's behind this technology enablement are two themes
Heres a question to ponder: Why do so many metadata initiatives deliver well below expectations, or outright fail?
The unique aspects of eThority are the usability and interactivity, which makes their analytics more intuitive and relevant than most business intelligence products
Consumer packaged goods companies spend enormous of time and money to optimize their penetration of brands within their relevant product categories
The scary part for decision-making is that the errors are both deductive - from population to sample - and inductive, from sample to population
Weyerhaeuser keeps records up to the minute with business rules to address quality at the point of data entry
Its important to realize that mastering data isnt really necessary if you only have a single system that contains one copy of data
The business intelligence implications of the rumored SAP/TIBCO merger are huge
The technology for information applications will continue to evolve and become one of the fastest growing software categories
Most manufacturing and services organizations have driven their supply chain processes to be lean and are now looking at the means to find new methods to gain incremental efficiency
I read what is certain to be my favorite BI book of 2009: 'The Drunkard's Walk, How Randomness Rules Our Lives'
Cisco Systems aims at operational insight that looks at the customer experience across multiple transaction platforms
One of the more common phrases in corporate life is: 'Close the loop.' This mantra refers to completing an improvement cycle by putting into play the insights gained by analytical processes
Much as I love the behavioral economics gospel espoused in Dan Ariely, I'm uncomfortable with the driving methodology of controlled classroom or laboratory experiments that use MIT business graduate students as subjects
Polycom is hoping to use product information management to shorten sales cycles in a complex, regulated global marketplace
IBMs bold move has already sent shockwaves throughout the analytics market
Whoever says that business intelligence market is commoditizing is smoking something funny
IBM now can empower new classes of analytic solutions with SPSS that go well beyond the traditional business intelligence applications that focus only on historical data
A discussion about the direction of business intelligence in several fields, including economics, statistical learning and the R platform
This acquisition will focus on two specific areas: high availability to support data replication and migration efforts and noninvasive real-time data integration capabilities
As business process and infrastructure outsourcing continues to trickle up in organizations, internal process owners could emerge to manage core business challenges
What really impacts our business/technology careers falls into a scope greater than the sum of our expertise and skills; for those not born to a single working destiny, follow the macro along with the micro
I think the planner/searcher dichotomy is quite pertinent for business intelligence
When I previously referred to architecture, I implied an engineered approach to solving a complex, multifaceted issue. One issue has to do with making complex decisions about inaccessible information
Actuate must take some credit for their work to date and be aggressive in what they can help organizations deploy easily with their existing development and technology skills
Ive noticed lately that data warehouse vendors are dusting off the arguments and pitches of days gone by
Bet you didnt realize that IDS Scheer has ARIS solutions in the fast growing markets of BI, analytics and complex event processing
Many business and IT professionals will attest to the fact that project management is the key to successful data implementations, although few data analysts will agree
Just what is behavioral economics?
A weak job market and poorly identified roles are making life dodgy for IT information workers; dont expect human resources to help, find the competency center and align yourself to it
IT needs to fully engage in IT performance management and document their strategies and scenarios to determine the best effective alignment for business
Master Data Management and Data Governance have common goals
The complexity of IT systems has caused IT to become reactive rather than proactive
Do your existing CRM investments work well and, if you replace them, will they just be a legacy investment?
As information career paths disintegrate and reform, the convergence of business and technology finds people creating their own jobs and the 'smartest' among us may be the most challenged
What good are economists anyway?
Different business audiences require varying intelligence needs and focus
The recent acquisition of Sun by Oracle has raised a lot of speculative discussion about the latter vendors strategic pursuits
At MDM Summit in Toronto, random pearls of wisdom from Ed Unrau at Canadian Tire Financial Services
I periodically revisit Bogle's wisdom not only to affirm my investment approach but also as a guide for BI
The market for BI might seem rosy for most in the industry. While some analyst firms predict significant growth, most of us who have been in the industry for decades know a different reality
A corporation loses money every time it delays getting information into the hands of decision-makers
BI can learn a lot from education and other not-for-profit programs for its mission to assess the performance of business
As I have dug into improving customer satisfaction, the question I ask myself is that although this seems like a laudable target, is it really?
The short answer is yes and it's surprising the titan of search hasn't done so already
At the Information Builders 2009 Summit in Nashville music and technology came together to present innovation and practical use of BI and information-enabling technologies
I was pretty distraught over a column I recently read in the Economist. The article, Light Work, challenges the legacy of the Hawthorne effect
With official release of free reporting suite, the BI vendor offers a clever reverse consolidation path for customers
Forrester has developed a maturity model for enterprise adoption of mashup-style, self-service development of business intelligence applications
The dirty little secret in most companies is that the BI reporting team has morphed into a de-facto enterprise reporting team
Lets start with my basic opinion: IT is not the business, it is an enabler
Quasi-experimental studies attempt to control potential confounding variables by clever design techniques and statistical adjustment
RFID-powered vending machines that let customers mix their own water, juices and sodas hoped to provide data for new product introductions
Behind a fairly generic name is SPSS's new version of their enterprise feedback management technology, which in my opinion is one of the key supporting technologies providing customer feedback management
After a day behind closed doors with executives I got a deeper appreciate for the portfolio of technologies today and in the future
I must admit I'm obsessed with BI designs for business performance measurement
Examples of transaction and performance management based systems to manage Eco Footprint are showing up in Europe; vendors will be ready to deliver products stateside as regulations call for
Virtualization is a venerable old computing concept that has achieved new life in recent years
The IT community has sensed a need for consistent terms. I propose we start a process to keep us all sane and volunteer to head the IEEE, ANSI or other committee
I came across an intriguing blog entry that raised issues pertinent to business and business intelligence
The tyranny of technology moves slowly and leaves the best bits behind for a longer run than people appreciate
This is yet another proof and Ill never get tired of saying this that BI market is as vibrant, exciting and far from commoditization as ever
I find the intersection of the social and quantitative sciences of particular interest and pertinence for BI
Molson Coors has been testing SAPs Business Objects Explorer release and they really like it
Corporate restructuring isnt just a financial challenge. It includes realignment of marketing activities, sales and operational issues
The current ascendance of Bayesian analysis in the statistical world is, I believe, a boon for BI
A service-based architecture requires information abstraction using metadata, whereas Web services do not use common metadata
Have a closer look at how consolidation and product maturity are paying off now for businesses with stalled IT budgets; you may already be a winner
The balls in IBMs rivals courts regarding whether, when and how they plan to add automated source discovery to their BI portfolios
I received an email from Mike Driscoll, co-chair of the Bay Area R Users Group, announcing the formation of three new groups Los Angeles, New York and Ottawa
Just another data quality failure from the annals of direct marketing, or am I missing the point?
We in IT struggle with understanding the business or is it the business that struggles with explaining themselves (and their needs or requirements) to IT
Corporate travel budgets have been slashed, but what are the long-term effects of less opportunity
There was no shortage of strong reaction at the MySQL conference to the acquisition of Sun. Open source purists were incensed
Whats really going on with advances in solid-state disk drives and how far will it go?
Oracle is acquiring longtime partner Sun Microsystems, putting the software powerhouse fully into the hardware business - and hitting the DW industry like an earthquake
Weve been talking for several years about the concept of a data supply chain. But IT executives are only now starting to catch on to its importance
There are few truly zero cost alternatives to BI tools, but there are some
The Elements of Statistical Learning: Data Mining, Inference and Prediction, Second Edition, by Trevor Hastie, Robert Tibshirani and Jerome Friedman is now available
Confusion about services-based architectures has been created by a number of industry elements
We need to consider enterprise architecture not as a hindrance, but as an asset to the IT organization and to the business that we are enabling
The R user community had just been provided access to a latest learning algorithm hot off the development presses from three world-renowned practitioners for free
We as a country should try to smooth over these economic dislocations so that they dont completely wipe some places off the map
On my way back home from www.himss.org show in Chicago, I have a creepy feeling of déjà vu. Even worse, it feels like the movie Groundhog Day where the main character keeps waking up on the same day, same date, never able to get to tomorrow
Why business rarely ties outcomes to job success for managers, and, if the New York Yankees are baseballs AIG, why dont we hate them?
I recently read an article in the Microsoft Architect Journal on so-called service-oriented business intelligence or, as the articles authors call it, SoBI
No sweat I thought. Weve honed our expertise on open source BI in the cloud. Weve performed this drill several times already. Alas, not so fast ...
Bill Inmon treats data federation as mutually exclusive from enterprise data warehousing, when in fact they are highly complementary approaches
With hype contained, developments in cloud computing are on display at conference
A colleague recently asked me for a good introductory text on the R statistical computing platform. Though there are a seemingly endless number of published books on R, I recommended a personal favorite
Has everyone forgotten database development fundamentals?
Cynics might call Semantic Web a technology looking for a solution. And they might have a point
Lessons from the NCAA tournament coverage; don't drop the ball on Web content management
Nudge is a concept derived from behavioral economics. It denotes a gentle push or incentive to coax decision-makers to choose a preferred option from a series of alternatives
When the going gets tough, the tough get lean, focused, and flexible. To help organizations survive the bad times and thrive in all climates, their information management initiatives must remain agile and adaptable
I always predicted that Open Source BI has to reach critical mass before it becomes a viable alternative for large enterprise BI platforms
Ive subscribed to the Harvard Business Review for years. It seems there are either several articles pertinent for BI or none at all. The February 2009 edition was the former
Reflecting on a conference, same as it ever was and still relevant
A couple of nights ago, I found my handwritten notes from 1997, summarizing a literature search I was doing at that time while authoring my first book, the IDG Books title Workflow Strategies. Reading the prettier handwriting of a 12-years-younger Jim Kobielus, I was struck by how little has changed since then
I recently talked to a client who was fixated on a hub-and-spoke solution to support his companys analytical applications. In the world of software and data, the one thing Ive learned is that there are no absolutes. And theres no such thing as a universal architecture
The following are nuggets of wisdom from RMS for planning/executing modeling studies, along with a statistical bloggers commentary
Youll know content vendors at a glance with this clever subway map; also, Skittles turns Web site over to those crazy social networkers
I had an amazing client experience. I searched long and hard for a client with flawless, 100 percent efficient and effective BI environment and applications. Imagine my utter and absolute amazement when I stumbled on one
As I mentioned in last weeks blogs, I was pleasantly surprised by version one of Predictive Analytics World, finding it quite useful on a number of levels. Today, I offer a few final observations on the conference
Why we're bad at benchmarking BI failures
Steve Miller's thoughts his trip to Predictive Analytics World
Nassim Nicholas Taleb was right. The world financial system was devastated by unpredicted catastrophes of grand proportion a financial black swan
Some integration approaches developed into multibillion dollar standalone markets. But others, while valuable, haven't survived as well. EII is one example
No matter how carefully one words a report, there is always the potential for misunderstanding
Everyone thinks that SOA is an integration framework. In fact its a means of remotely accessing other systems and their related information without having to know the details
Goes around, comes around; back office meets its match in tech-savvy users
As with any other current buzzword, the world seems to be piling on and the meaning of operational BI seems to be is evolving (or eroding)
22 Comments
Occam's razor, in latin, is translated to something like: "Entities should not be unnecessarily multiplied." I think he preferred normalization, or at least discouraged denormalization.
Posted by: Steve R | January 4, 2010 1:13 PM
Report this Comment
We've implemented our columnar database for some major companies in Brazil - credit cards, banks, telecom, credit bureau industries - with hundreds of columns and billions of rows, and we are getting outstanding performance for OLAP applications, running queries on wintel platform, with 1 to 5 sec of response time. We developed this technology using Codd and Date concepts, strictly as is, twenty years ago.
Posted by: Waldo | October 26, 2009 3:31 PM
Report this Comment
In the years that I've been working in the technology industry (we won't get into how much gray hair or wrinkles I have) there's always lively discussions regarding the effectiveness of a new technical paradigm or design. Some of my experineces include when...
* Most vendors and IT groups fought the effectiveness and benefits of relational database technology in the early 80's. The argument was speed, maturity, and reliability of the incumbant technology.
* Most folks didn't believe parallel processing was practical in commercial application in the mid- and late-80's. The masses said it was too complex, and impractical.
* I myself battled (and lost) an argument in the late 80's about my belief that windows was more likely to succeed than desktop Unix (Everyone thought that was settled...but they're back...)
I'd also be remiss if I didn't mention the number of "can't miss technologies" that delivered incremental (or possibly invisible) benefits -- the Apple Newton, Next, BeOS, PC-RT, AI for the masses, Apple Lisa, XT/370, etc.
Tne one thing that's become clear -- as processor speeds have increased, memory costs have dropped, and disk capacity expanded, many "failed" technologies became practical and possible.
At the end of the day, the proof will be in functioning software with real world results. I wouldn't bet against columnar databases -- some of the numbers the early customers are seeing challenge the traditional paradigm.
It's important to consider that the focus of these newer companies isn't to replace existing applications, but to target the applications that simply aren't possible with "today's" proven technology.
Time will tell.
E.
Posted by: Evan L | June 11, 2009 12:58 AM
Report this Comment
Several comments in this thread suggest that Columnar and MPP are mutually exclusive. That is not the case. The latest generation of columnar databases including Vertica's Analytic Database are based on MPP architectures. This combines the best of two different worlds. Less I/O and the scalability/performance of MPP architectures. The I/O advantages of colunmnar databases are dramatic for appropriate workloads. (SELECT * is not typically one of them!) However I disagree with David N's assertion that the number of columns required for "true analytics" is beyond the reach of columnar databases. We have customers who are regularly accessing dozens of columns in their queries and enjoy orders of magnitude better performance over row-based alternatives. I would also suggest that Ben Werther's comment about tuple-assembly cost does not apply to Vertica the way it may to other colunmnar databases because tuple reassembly (and conversion of compressed data to its original expanded form) is deferred until result sets are materialized as part of the query results.
FULL DISCLOSURE: As you might have determined from my comments I work for Vertica.
Posted by: DMenninger | June 4, 2009 4:29 PM
Report this Comment
Columnar databases are great if: 1) Your table has lots of columns (100s) 2) You only want to query a small number of them (low 10s or less)
Problem is, if you deviate from these, the tuple re-assembly costs kill you. Products like Sybase IQ and Vertica do good business targeting the niche data mining workloads that have these characteristics. But good luck trying to find either of these products in use in a general-purpose EDW or analytical data mart.
'Columnar-oriented storage' is really just one of many possible features of a database. Far more is the degree of parallelism -- hence the dominance of MPP (massively-parallel-processing) databases like Teradata, Greenplum and Netezza for high-end data warehousing and analysis. These guys squash the columnar guys on most data warehouse usage scenarios and handle any schema and arbitrary ad-hoc queries without the fragility of a columnar approach.
Posted by: Ben Werther | May 27, 2009 6:09 PM
Report this Comment
Michael M: Yes, that's the right quote, and the right spelling of (William of) Ockham, even though Occam has become the prevalent way to spell it, as in Occam's Razor. However, the implication is still to "favor" the simpler solution, but not to assume it is correct. During his life, theologians and rational philosophers worked very hard to catch up myth with science, creating extremely complex cosmologies, for example, to explain the motion of the heavenly bodies that still conformed to the geocentric approach rather than the simpler heliocentric model.
So my original point was, if a database approach seems simpler, you may favor it, but not accept it until proven.
-Neil Raden
Posted by: Neil R | May 27, 2009 1:32 PM
Report this Comment
Leslie S: I didn't get acquainted with APL until 1977, but what you describe, matrix inversion, was a primitive of APL. It's a stretch calling APL a database, though. It was just a cryptic interpretive language for data and text manipulation. IBM did, however, develop a columnar database based on APL called ADRS (A Departmental Reporting System) which used APL to extract and transform record-type data into ADRS' columnar structure using a sort of proto-ETL tool called APLDI (APL-Data Interchange).
In the early 80's, I worked with a guy (Clark French) who invented a columnar database called Expressway, which he sold to Sybase and was renamed SybaseIQ. And as someone else has pointed out, Model204 has been around for decades and still endures.
So columnar has been around for a long time, but that doesn't mean that newer or mature implementations of it aren't far better suited to our technology today. for example, none of these early tools could handle parallelism, large memory models or operate with standards-based APLI's and SDKs.
As far as performance is concerned, the case can be made that reading whole pages of data to access a few bits of it is wasteful, but it isn't yet clear to me that it's such as case of black and white. Row-based databases can index and compress too, and once the data you need is in memory, how much difference does it make? We'll have to see.
-Neil Raden
Posted by: Neil R | May 27, 2009 1:22 PM
Report this Comment
Good article and fair portrayal of column databases. However, column databases are but one take on the challenges posed by traditional single-processor RBDMS architectures. MPP is another approach, and has had far more success in the short time that the approach has been around (most vendors, except Teradata, are this decade's startups).
Column databases are challenged by true analytics - their effectiveness wanes as increasing number of colums are pulled into the query. Requesting client information is not analytics. True real-time analytics requires far more complexity and chips away at the benefits of column databases.
Posted by: David N | May 27, 2009 7:48 AM
Report this Comment
In "Summa Totius Logicae", Ockham stated "Frustra fit per plura quod potest fieri per pauciora" which translates to "It is futile to do with more things that can be done with fewer."
Posted by: Michael M | May 27, 2009 6:12 AM
Report this Comment
It would be nice to see more support for the claims made. In my opinion, the hardcore of analytics lies on aggregated data at various grains. It would be a strong advantage if columnar databases could outperform relational databases in this competition.
It was an interesting point about asking questions. As a part of my PhD I was trying to find a balance between natural languages and the strict formalism of artificial languages. Eventually, any question applies an attribute to a concept. An answer to a question is always a functional dependency that tries to obtain a value.
You can find more about this model in my book at Amazon ("Operational Product Performance ACCEL: Analytical Approach Towards Learning How to Design Successful Products")
Maxim 4suc6
Posted by: Maxim I | May 27, 2009 5:31 AM
Report this Comment
I think an even more radical approach to database architecture and storage is needed.
I have 25 years experience with Oracle starting with Oracle 5.
I am presently developing applications with LazySoftware's Sentences Associational DBMS lazysoft.com. It is far easier to use than relational technology and provides excellent performance.
Posted by: Grant Morgan C | May 27, 2009 3:25 AM
Report this Comment
Way way back in 1970, APL was a popular (geeky) language for we mathematicians.
One of the workspaces dealt with table transformations. where the i rows by j columns was converted to a table of j columns with i row entries.
Research into this table, using integer conversion of ascii data was phenomenal in terms of speed.
So there is nothing new here, except that inserts take a long time in the columner arrays vs row inserts.
It was back in 1970, before Oracle came on the scene.
Posted by: lsatenstein | May 26, 2009 9:03 PM
Report this Comment
What about MonetDB? I've had it running four years for some specialty apps at a university and am not sure it has ever been rebooted of failed - I am sure it has never been tuned. My humble opinion is it belongs in the list of established CO databases - especially as IQ is still burdened with the biggest marketing blunder since new coke* and Stonebreaker (Vertica) is a really brilliant scientist who has contributed a lot but cant seem to market squat
*should never have been branded so strongly as Sybase - POS (plain ole Sybase) is a great product but was really on the corporate outs when IQ was purchased. IQ could have changed the world but instead started a cult.
Posted by: joe b | May 26, 2009 7:28 PM
Report this Comment
I'd like to point those interested in Evan's column to a different point of view in David Raab's piece in the latest issue of Information Management. I found it interesting in that David is not so much concerned with the DBA aspect as much as he is in the needs of the marketing functions who in this example access it. Click on our June issue cover on the homepage to scroll to David's article. We've also had some lively DB discussions lately on DM Radio, this one with some very sharp people in the mix including Forrester's Boris Evelson: http://www.information-management.com/dmradio/-10015384-1.html -ed
Posted by: jericson | May 26, 2009 7:13 PM
Report this Comment
Measuring an OLAP engine by OLTP criteria is misleading at best and may distroy inventive initiative at worst.
I have used a proprietary application from Hilbert Technologies, Inc., which employed the columar db concept. The resulting array dbs (another name for columar dbs) where made even more efficient by using text-to-integer transforms. This technique put the text data into the native (integer) language of the computer (ASCII data is intrinsically costly.)
The columns being integer arrays were manipulated via memory mapped files on a PC (could be other platforms by now) and resulted in retrivals, compares and merges of multiple databases (millions of records) thousands of times faster than any hierachical or relational db engine. Skip analytics for a moment - did I hear someone say Data Cleansing?
The fact that we are having this friendly discussion proves that db industry momentum is a burden to progress.
But then I could be wrong...
Posted by: David O | May 26, 2009 6:37 PM
Report this Comment
It seems to me that the answer also depends on what you call "analytics". In the example above, just pulling a name and address doesn't seem like a good example of a complex analytic question, which would be both i/o and cpu intensive.
The relative performance of columnar/row stores would probably vary a lot depending on the type of question being asked (complexity of predicates, etc.), the analytic schema, etc., and in that sense, it seems hard to generalize about the "best" approach for "analytics".
Maybe exadata/db machine, which cuts i/o by performing much of the projection on the storage level, is the best of both?
See also: Read-Optimized Databases, In Depth, by Allison L. Holloway and David J. DeWitt:
http://pages.cs.wisc.edu/~ahollowa/paper377.pdf
Posted by: Kolin O | May 26, 2009 5:40 PM
Report this Comment
I can verify this much, I did the comparison on my laptop with Oracle 10.2 and Sybase IQ. 12.7, running the same query against an identical set of data, where IQ was column-based, Oracle is row-based, 1.2 M rows.
Results: Oracle: 42 seconds, Sybase IQ: 13 seconds.
I am a certified Oracle DBA having specialized in performance tuning both databases and SQL queries, so this kind of changed my thinking about 2 years ago.
Sybase IQ has made some significant strides since then, significantly lowering the entry price to IQ so that it makes a very compelling alternative to smaller organizations. If only their marketing was as good as their technology, they would be a much bigger player in the market.
Evan, who are the worthy open-source players in this market?
Posted by: Ferenc M | May 26, 2009 5:38 PM
Report this Comment
You've missed a whole generation of column based databases, think M204, think Sand Technologies and their DOD funded Nucleus database...and if you want to think about marketing segmentation, the modelling principle is more akin to a star schema, here start with AT
Posted by: Chris Day Big Data - Done Right | May 26, 2009 4:33 PM
Report this Comment
In answer to Peter, the columnar db should perform better on I/O because many values (rows) will be stored in the equivalent of a physical page, and in the analytics context you would normally be reading many rows at a time, not a single row via a primary key. However, I quite agree that the RDBMS didn't win on speed (although the latest TPC benchmarks have DB2 clocking a whopping 6 million transactions a minute, which isn't exactly slow). The concurrency and content-based referential integrity built into the RDBMS model make it enormously powerful for high-volume operational systems, and it can run analytical processes pretty fast as well. The columnar db might have some speed advantages in analytics but I don't think it has any OLTP capability at all (I shudder to think of the locking implications of a big update!), which makes the ROI much poorer than for the all-purpose RDBMS (although to be fair, big companies often have warehouses and analytic engines built on separate instances if not totally different products).
It is also not correct to say that the whole row has to be retrieved with every query from an RDBMS. Most of them will return data from an index-only read if the index covers all the columns required by the query (so no data page physical I/O is necessary). So if you create indexes for every column in a dimension (for example), you will end up with data structures very similar to those in a columnar db
Posted by: Caroline B | May 26, 2009 3:32 PM
Report this Comment
How is the "columns" based database any different from the CJ Date's description of tubes? The principles of getting/accessible only the data you need is indeed a core concept of an RDBMS. Projection is a core component of SQL. A table is a collection of related tubes (the common misconception of the word relations as being between tables is of course - a misconception).
I also fail to see where the speed increase from column only objects is found. With a traditional DB, all my columns are in one block. Accessing a single record by primary key is a matter of 2-4 IOs depending on how big my index tree is and regardless of getting 1 or 50 columns. With column based, you duplicate this access path per key per column? How can that ever be any faster?
If we want to improve the conceptual speed of databases, we should abandon the relational principle and just go back to the old network or hierarchical database designs. A lot less overhead and much faster to search. The point being, that relational databases didn't win because of their performance speed by their dynamics.
Posted by: Peter L | May 26, 2009 3:08 PM
Report this Comment
From what little I know about Occam's Razor, it says if you have two theories that predict the same results, the simpler theory is better.
I think Evan has extended it to say if you have two solutions that produce the same results, the simpler is better. I like the analogy.
So not sure about the notion of only one being correct if they're both predicting the same results?
On a different note, how does columnar stack up to MPP?
Posted by: Terri_Rylander | May 26, 2009 2:12 PM
Report this Comment
I of course would never argue with Evan, but I would like to point out that his description of Occam's Razor is a common misconception. What Occam really said was to "favor" the simplest hypothesis, not that it was necessarily correct. This was actually a thinly veiled attack on the Church whose teachings were at variance with observation.
Not unlike our industry.
-Neil Raden
Posted by: Neil R | May 26, 2009 1:45 PM
Report this Comment
Add Your Comments...