A growing number of organizations are creating data lakes in order to give a greater number of employees access to the data they desire. But that is also creating more security concerns for data managers.

Reiner Kappenberger from HPE Security-Data Security spoke with Information Management about growing security concerns among attendees at the recent Strata & Hadoop World conference in San Jose, CA, and what they are based on. Reiner is the firm’s senior executive focused on big data and Hadoop.


Information Management: What are the most common themes that you heard among attendees?

Reiner Kappenberger: Securing the data lake is a very common theme. We conducted a survey at last year’s conference, and protecting sensitive data in Hadoop was a top-of-mind concern for over 70 percent of the survey participants.

From people I spoke with this year, that percentage has definitely not diminished; in fact it’s likely increased. It makes sense when you consider that most organizations use some form of sensitive data such as PCI (payment card information), PII (personally identifiable information) or PHI (protected health information).


IM: What are the most common data challenges that attendees are facing?

RK: Knowing how and when to start a big data security initiative is the biggest challenge. Whether you take advantage of commercially available security solutions, or develop your own proprietary approach, we advocate five steps will help you to identify what needs protecting and apply the right techniques to protect it. The best time is before you put Hadoop into production.

Those steps are to audit and understand your Hadoop data; perform threat modeling on sensitive data; identify the business-critical values within sensitive data; apply tokenization and format-preserving encryption on data as it is ingested; and finally, provide data-at-rest encryption throughout the Hadoop cluster.

The perfect time to undertake this process is after you’ve done a pilot and before you’ve put anything into production. If you’ve done the pre-work, you’ll understand your queries, and adding the format-preserving encryption and tokenization to the relevant fields can be done very easily, taking just a few days to create a proof of concept.


IM: What are the most surprising things that you heard from attendees?

RK: The most surprising thing is that in the past, attendees were mostly focused on performance and the latest functionality, with security being an afterthought. Now, it has now shifted towards decision-makers and practitioners actually understanding that security is a major element and requirement for Hadoop.

This shift in focus shows that Hadoop customers have come to understand the challenges that Hadoop presents for data security, and they are very interested to understand how a data-centric security approach can help them create a secure environment for analytics.

The message is getting out that utilizing NIST-based Format-Preserving Encryption, a form of AES encryption, and Tokenization – which does not impact performance or analytical capabilities – is the best way to maintain their infrastructure and add security without the typical encryption penalties, thus allowing them to support and provide the business use without incurring additional overhead.


IM: What does your company view as the top data issues or challenges in 2016?

RK: Hadoop is ground zero for the battle between the business and security. The business needs the scalable, low-cost Hadoop infrastructure so it can take analytics to the next level—a prospect with myriad efficiency and revenue implications. Yet Hadoop includes few safeguards, leaving it to enterprises to add a security layer.

So implementing Hadoop without robust security in place takes risk to a whole new level. But armed with good information and a few best practices, security leaders can put an end to the standoff.


IM: How do these themes and challenges relate to your company’s market strategy this year?

RK: With a data-centric security strategy as you plan and implement big data projects or Hadoop deployments, you can neutralize the effects of damaging data breaches and help ensure attackers will glean nothing from attempts to breach Hadoop in the enterprise.

What do I mean by data-centric? Data exists in three basic ways - at rest, in use, and in motion. The data-centric approach is in contrast to traditional network-based approaches to security, which haven’t responded directly to the emerging need for security that neutralizes the effects of a breach through protection of sensitive data at the field-level.

With data-centric security, sensitive field-level data elements are replaced with usable, but de-identified, equivalents that retain their format, behavior and meaning. This means you modify only the sensitive data elements so they are no longer real values, and thus are no longer sensitive, but they still look like legitimate data.

The format-preserving approach can be used with both structured and semi-structured data. This is also called “end-to-end data protection” and provides an enterprise-wide solution for data protection that extends into Hadoop and beyond the Hadoop environment. This protected form of the data can then be used in subsequent applications, analytic engines, and data transfers and data stores.

A major benefit is that a majority of analytics can be performed on de-identified data protected with data-centric techniques – data scientists do not need access to live payment card industry (PCI), protected health information (PHI) or personally identifiable information (PII) in order to achieve the needed business insights.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access