Stuart McDonald works for Entity Group as a probabilistic matching expert, focussed on using the Probabilistic Matching Engine (PME) in the IBM Master Data Management (MDM) software suite.

A very simple, practical example could be: Are these two addresses the same?

· Entity House, 980 Cornforth Drive, Sittingbourne, United Kingdom. ME9 8PX

· 980 Cornforth Dr, S’bourne, UK

A human eye (with a bit of help from Google Maps) will probably judge them to be the same, but how can you reliably codify this judgement so an algorithm can reach the same conclusion? And how can you ensure this algorithm also works with any UK or worldwide address?

Stuart’s blog discusses the dependency data scientists have on data preparation and conformity, Master Data Management, including advanced techniques such as entity resolution using probabilistic matching. As data scientists we tend to refer to these activities as ‘Data Munging’ and we all know how challenging and time consuming it can be! Stuart’s blog starts to lift the lid on elements of this.

Entity Group is classed as a Small/Medium Enterprise (SME) and frequently partner with Capgemini since they provide niche expertise which fulfil a core component of any Data Science team.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access