We are in the process of loading a data warehouse from a mainframe source. Some of the files contain millions of rows. I am interested in how accurate the data is. Instead of checking all the records, I would only like to sample a few hundred or so. Is there a formula or rule of thumb that would give me a 95 percent or 99 percent confidence level that the data is correct assuming the data is either right or wrong?

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access