It was shortly after I missed his sessions at a recent conference that I met and got to know Alan Pelz-Sharpe of CMS Watch, a nine-person analyst firm neutrally covering the unstructured data space. Alan had been presenting at TDWI’s show in Las Vegas along with his partner and principal at the same firm, Tony Byrne. They were on a road show that would include AIIM and DAMA as well. Since unstructured content is a reality or on the horizon for many BI professionals used to the structured data world, I thought it would be a good time to ping Alan on where things are meeting up, and what in the world he was doing at a conference mostly dedicated to data warehousing.


Jim Ericson: So Alan, what in the world were you doing at a conference mostly dedicated to data warehousing and how did it go?


Alan Pelz-Sharpe: It was very interesting, we didn’t get a huge turnout at our sessions but I don’t think anyone expected that. What we did get was an incredibly engaged audience and Tony and I came back saying, it’s not about the crossover of the structured and unstructured world, it’s about the reality of the people who are working in these sectors and how those worlds touch. What I’m trying to say in a roundabout way is, I think what we’ve been focused on in the past was the technology overlap and maybe there’s less of that than we thought. But in the day-to-day workings of a database administrator or IT manager, they don’t make those distinctions between structured and unstructured, they get asked to look at this project and that project and somehow pull them together. At a practical business level I think there’s an awful lot to talk about.


JE: We reporters and analysts tend to get ahead of the reality curve. BI Review held a conference last year with a session on unstructured data and I had a tough time even lining up speakers. But there was one fellow from a law firm who really captivated the audience who just assumed some of this stuff couldn’t be done.


APS: Yes, I think in the past analysts have gotten a bit too excited and run off on the path saying, it’s the death of BI, search is going to take over, or it’s the death of search, BI is going to take over. That’s not reality, but there is reality in what businesses are actually trying to do out there.


JE: So what about unstructured and structured being different animals?


APS: It isn’t that there isn’t any overlap, some search products actually do touch on the world of BI and I’m sure that will happen more and more. The end user experience will probably become more merged and transparent. To them it will just be a search or query if you want to put it that way. But at the back end they are very different animals. Where things are changing a little bit is that search engines, I think in fairness, they’ve gone through a bit of a revolution. Early search engines really were just [about] key words, and if it hits on a word, there’s your result. They’ve morphed into really complex analytic engines that can be tuned constantly. That’s different, the early search engines either got your result or didn’t. That [new] element has come really from the business intelligence world and there are crossover areas like Web analytics. That’s a new-ish topic at CMSWatch, we’ve only been covering that for about a year now but I found it fascinating that the audience was so engaged on the topic in Vegas. I think what was so fascinating was Tony constantly stopping and trying to explain himself. Just when people in the audience thought they got it, they suddenly didn’t. There’s a lot of commonality but again…


JE: I’m really interested in Web analytics given my own media challenge and also the general corporate push online. Web analytics and BI analytics really are different animals in terms of scale, scope, parameters etc.


APS: Yeah, exactly. Different parameters and our conference session veered a bit into a very constructive corner where we were really explaining how a Web page is actually structured and the implications of delivering java scripts and frankly how poor the data that’s coming back often is. Just some basics like page views versus the actual number of people visiting your site. I didn’t want to be rude to the audience but clearly none of them had touched on this before.


JE: I wrote a pretty simple case study on Fox Sports’ using Web analytics and got more mail on that than on any story recently. A fellow at Yahoo wrote me to say, 'cool story, yeah, that’s what we’re trying to do here.' That was surprising coming from a company like that.


APS: But that’s the truth of it, even for a business like Yahoo, you can replicate that experience anywhere. If you’re in telco you know data warehousing inside out. You’ve been tracking usage and call data for years and you’re into the fine details of mathematics really. When you get to the Web it’s the Wild West.


JE: Just going back to the general unstructured data non-Web stuff, it’s a frontier for us at DM Review as well. Its something we’re trying to address constructively. The data warehousing industry has been stop and go in this regard. I talked to Bill Inmon a couple of years ago who was trying to stuff unstructured content into the data warehouse, accommodate it in the old model and I just don’t think it’s the same animal, as you’re saying.


APS: It’s not the same, and that was the focus of our two workshops on Web analytics and on simple content technologies. I was explaining simple document management last week, really if you think of your old-fashioned filing clerk and the cabinets and drawers of files…


JE: I still have a stapler on my desk that is a crucial tool in my filing system. It’s not going anywhere until I’m sure I have something better.


APS: I am paper-centric myself. That’s the reality of the software, it mimics paper filing, which is actually a little more difficult than people give it credit for. When you get into engineering or health care, your number schemas and filing schemas get quite specialized. It’s not a database.


JE: Right, there are also digital assets, warranty, rights and contract assets, all sorts of stuff.Before I came to DM Review I covered the waterfront, including unstructured content, and was watching vendors like Interwoven, Vignette, Filenet or Documentum where the focus was on workflow. Has that changed?


APS: It’s changed but remains the same as well. That landscape still exists and it will never go away. The day lawyers stop dealing with complex cases that deal with hundreds of emails and documents will never arrive. There’s no reason for that to go away because it works pretty well. The other traditional landscape is the imaging market. High speed scanning of checks or insurance or medical claims forms is staggeringly boring but hugely profitable. It’s also quite difficult to do, so that landscape still exists but is a bit harder to do these days in the sense that when you scan something these days you don’t just create a TIFF image, you actually have software intelligent enough to break down and read the page and extract data for various bits of workflows, so it’s sort of moved on a bit. But the imaging and the document management worlds are still there.


JE: The content management space back then aligned itself with the early enterprise portal companies to pursue collaboration and knowledge management as well as workflow.


APS: That’s coming back, but a little differently now. I hate to use the word collaboration, but look at Microsoft SharePoint for example. I think 80 million seats of SharePoint have been shipped, it’s hugely successful. It’s just basic sharing of office documents and has become a huge market in its own right.


JE: You just carve out a workspace, gather a few folks and have at it, maybe with some analytics to boot.


APS: Right, you build a mini-portal and you and your team work there, it’s easy to use where less than 10 years ago that was a hugely expensive thing to do.


JE: Email has to be getting huge in the unstructured space.


APS: Yes, that’s the other world that’s coming up, and this actually touches on the structured data world more than people realize in email archiving and email management. That is just about to explode as a market and there really isn’t a large company in the world that isn’t looking at that. If you think about what an email archive is, well, a small one might be 40 terabytes, a large one might be three or four petabytes, just one huge database at the end of the day and it’s sort of falling into the enterprise content management [ECM] technology world but it probably shouldn’t. It’s really more of a back end data center task. It’s sort of hovering between the two.


JE: And people seem more concerned about compliance than being constructive with email.


APS: That’s right, so it splits into those two things, email management, which is the compliance thing and hence ECM vendors are all trying to get involved. But then you’ve got the simple archiving, I can’t operate with 20 terabytes of old emails on my exchange server. It doesn’t work but I can’t get rid of it so I’ll back it up, but that doesn’t work, I’ve got the compliance thing so I’m actually going to have to archive it and search it in an intelligent way with e-discovery tools at a later date. That’s an emerging market but it’s emerging at a heck of a pace.


(end of interview)


I was just about ready to jump into Web versus desktop applications when I realized I’d gone too far to stay fair to our readers. I’m going to carve out some space at to move down this road, and if you’ve read this column to the end, you’re someone who might be interested. If you are, drop me a line at

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access