Enterprise architects, are you mired in a tangled web of data marts while your business pursues customer engagement without you? If you think a Hadoop-centric architecture is going to save the day, you may need to rethink. Your customers expect you to create systems of insight to deliver win-win engagement in real time. I'm seeing a new class of digital predators leverage the cloud to do just this. For example, Netflix designs cover graphics for its series based on subscriber viewing habits. They know their customers that well.
I call their technology approach an Elastic Analytics Platform in my recently published report. I formally define it as:
"A combination of data storage and middleware technology that allows the creation and dissolution of analytics components on demand, while provisioning these with data from one, or a few, distributed, virtualized data sources."
That's a mouthful. So here's a rough picture:
Firms like Netflix, Stich Fix (who? read the linked KDnuggets blog post), and LinkedIn are sourcing all their data, and I mean everything, into a few data stores in the cloud. Next they are exploiting cloud to create analytic workloads on demand. This gives them elasticity two ways. First they get scale-out storage, second the get on-demand analytics components. For example Netflix can spin up Hadoop, Spark or Kafka clusters as they need them and provision these from Kafka or S3. They also have Teradata on Amazon. This gives them enormous flexibility to create as much of what they need when they need it.
Where is the beef? It is in the middleware they are developing using a bunch of open source tools likeGenie and Kragle. Their architecture features a data pipeline to S3 and Kafka, then it handles the creation of EMR, Hadoop, or Spark or vendor specific analytics workloads on demand. While I was at Strata, I listened to Kurt Brown from Netflix talk about all the things this approach let's them do. It's mind blowing. Read this article if you really want something to make you go, "Seriously?"
Don't think this is only something digital upstarts can do, either. You don't get off the hook that easily. Take a look at what Cloudera is doing with EMC and HP is working with its hardware division to let enterprises do similar things soon. Plenty of innovative young vendors are jumping on this as well. For example Snowflake is using this architecture to create “virtual data warehouses” on demand in the cloud.
What it means for enterprise architects is this: big data predictive analytics architectures are changing beyond just data lakes. The Elastic Analytics Platform will revolutionize data science and predictive analytics. Why? Because it will let you use all your data, streaming or in batch, while keeping things both affordable and flexible. The sticky problem right now is the middleware needed to glue the storage and analytics components together. That has a long way to go, but its the subject of efforts like project Myriad. Expect a lot of progress over the next few years.
Are you planning on pursuing this? Have you built one already? Let me know as I'm doing research on the details; especially how established enterprises can or are doing similar things. More reports coming soon.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access