The General Data Protection Regulation doesn't come into effect until May 2018 but it is already having a profound impact on many organizations that process the personal data of EU citizens, as well as those citizens directly.

Organizations should have started their compliance initiative by now and many that have are finding it to be a huge logistical task. Challenges involve internal awareness and training, reviewing policies and procedures, defining and changing data flows and system/data auditing. Each one of these compliance tasks is pitted against the backdrop of the huge increase in enterprise data and could involve the assessment of massive multi-jurisdictional databases or documents that run into the hundreds of millions.

In the area that I work in, I’m seeing organizations respond by conducting information and security audits as a starting-point for remedial action. Such remedial action might involve data silo consolidations, changing from optimistic to pessimistic security models, new information classifications and retention policies, data-clean-up ("minimization") and new security measures, especially for sensitive data types as defined in Article 9.

The GDPR also introduces the obligation that organizations adopt privacy by design and default (Article 25). As a result, I've seen organizations want permanent management of personal information in data repositories (volumes, growth, age, redundancy, permissions) as well as monitoring of suspicious user behaviors or data breaches.

The scale of these tasks is large, especially around personal data in huge repositories and unstructured documents. Data crawling tools and analytics can help. Organizations are adopting solutions that crawl and index multiple data sources. Using Optical Character Recognition (OCR) helps make dark data discoverable.

Once in the appropriate form, data mining techniques such as Named Entity Recognition (NER), Natural Language Processing (NLP) and Machine-Learning classification can help find personal data within the relevant context – the aim is to find someone’s health data instead of just a reference to a health term. Then, solutions can help to allocate the clean-up tasks to the right people and purge any redundant data or secure any sensitive data.

On the security side, some solutions can utilize machine learning to determine "normal vs. abnormal" user behavior and help to flag potential data breaches.

On the other side of the processor/owner relationship is the Citizen, and they are being made aware of their enhanced rights through the press and awareness campaigns by EU governments.

Some will undoubtedly be thinking about how they might exercise these rights, such as Data Subject Access Requests (DSARs) and the right to be forgotten. So, organizations should be fully prepared for responding to DSARs within 30 days across potentially tens of thousands of documents and emails and they can't charge the person making the request.

I've regularly heard of DSARs costing organizations £5000-£50,000 each because it requires a small team to trawl through a huge amount of data - and often with a fine-tooth comb. This is because personal information is very broadly defined and is mixed with sensitive business information or personal information of third parties within the same document. Also, there are legal exemptions to disclosure, such as litigation or legal privilege.

Organizations should devote lots of effort to defining the workflow process for each DSAR. Have an itinerary of systems to search or adopt enterprise search solutions which could search across all of them and utilize the same data mining techniques mentioned earlier. Ensure that dark data is searchable and have the right tools for expediting review, including a playbook for personal data identification and the right tools for redaction. Also, the data must be transferred to the requester in a secure way so organizations are looking at secure online portals as email would not be appropriate.

Unfortunately, there isn’t a one-size-fits-all GDPR solution for organizations since each are different in how they interact with personal data. But start with defining your data flows and think about the risks. Once you've identified a remedial action, think about how technology could assist in the process.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access