Continue in 2 seconds

I'm in charge of our data warehouse and one of my colleagues suggested we look into purchasing a job scheduler to handle file transfers, job dependencies, etc.

By
  • Joe Oates, Chuck Kelley, Michael Jennings, Clay Rehm, Les Barbusinski
Published
  • April 02 2003, 1:00am EST

Q:

I'm in charge of our data warehouse and one of my colleagues suggested we look into purchasing a job scheduler to handle file transfers, job dependencies, etc. Not sure I understand the importance of job scheduling in the data warehouse realm. Could you give me any insights here?

A:

Les Barbusinski’s Answer: Simple data warehouses don’t need a job scheduler, but more complex implementations do. Most shops start out scheduling their batch DW jobs using the UNIX CRON or Windows AT commands. However, neither of these commands can handle:

  • Conditional dependencies (i.e., triggering a job based on an event such as the completion of a prior job or the existence of a flat file in a staging directory);
  • Notification (i.e., alerting on-call personnel of a job failure or overdue event via pager or cell phone); or
  • Load balancing (i.e., preventing two jobs that update the same table from running at the same time or preventing two resource-intensive jobs from running at the same time).

Here are some scenarios where a job scheduler can come in handy:

  • Delaying the start of your ETL scripts until after your database backup jobs have successfully completed.
  • Delaying the start of an extract job until after the source system has successfully completed its nightly batch processing.
  • Delaying the start of an ETL script that updates a fact table until all of the related dimension tables have been successfully loaded (in order to avoid RI problems).
  • Automatically triggering an ETL script only if a predecessor job ends with a certain return code (e.g., triggering the successor job only if the predecessor job actually processed some data).
  • Automatically triggering a BI report as soon as its source data mart has been fully loaded.
  • Automatically triggering an ETL script when a flat file has been successfully FTPd into a specified staging directory by a source system.
  • Automatically preventing two jobs that update the same table (with different dependencies) from running concurrently, thereby precluding deadlocks.
  • Automatically preventing two CPU- intensive jobs (with different dependencies) from running concurrently (thereby precluding resource contention).
  • Having the job scheduler page the on-call person when an expected event (e.g., the completion of a predecessor job or the appearance of a flat file) does not occur within a predicted time frame.

There are job schedulers for virtually every platform and environment. Some of the more well known include CA’s AutoSys, IBM’s Tivoli Workload Scheduler, Argent’s Job Scheduler and BMC’s CONTROL-M… among others.

Hope this helps.

Mike Jennings’ Answer: Job schedulers can typically be used in conjunction with a variety of data warehouse products and tasks to automatically initiate processing cycles for the data warehouse environment. This provides a hands-off approach to data warehouse processing cycles, error notification, exception processing, support paging and operational meta data statistics. Through the scheduler, data warehouse processing can be initiated once source system processing has completed. Data warehouse ETL processing, security updates, cube builds, report portal publishing, e-mail notification and scheduled reporting are some of the examples of tasks that can be initiated through the scheduler.

Chuck Kelley’s Answer: Job schedulers are important if you have files that have to be moved from system to system at a specific point in time or to run a lights-out environment. However before purchasing a job scheduler, check out what your extraction, transformation and load (ETL) can offer. Some of them may have what you need.

Joe Oates’ Answer: There are several areas where job scheduling is important. In an enterprise data warehouse or large data mart, there are many different systems from which data must be extracted, transformed and loaded (ETL). The jobs that perform these tasks must usually be done in a particular sequence. Additionally, there may be standard reports or other analyses that must be run when the ETL process has been completed. These jobs should be automated. I once had a client who did not initially want to use a job scheduler. After about two months, the client did install a job scheduler because the manual process resulted in so many errors that took up so much technical and end-user time, that it was far more expensive than investing in an automated job scheduling solution.

Clay Rehm’s Answer: The data warehouse environment needs job scheduling like the operational environment. Job scheduling automates the file transfers, dependencies, data loads and other process so this does not need to be done manually. If you want to look at a simple job scheduler, look at Microsoft Windows job scheduler on your computer (control panel – scheduled tasks).

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access