Free Site RegistrationFree Site Registration

Sign up today and access Information Management on the web!
Your FREE registration entitles you to:

FREE email newsletters

FREE access to all Information Management content

FREE access to web seminars, resource portals, our white paper library and more!

What are the pros and cons of horizontal database partitioning and please focus on the scenario when the database is part of a package (e.g., Siebel) which contains meta data.

Information Management Online, November 5, 2003

Chuck Kelley, Clay Rehm, Scott Howard

Q:

Advertisement

What are the pros and cons of horizontal database partitioning and please focus on the scenario when the database is part of a package (e.g., Siebel) which contains meta data.

A:

Scott Howard’s Answer: Horizontal partitioning can take several forms depending on how your relational database management system (RDBMS) vendor supports such and how you choose to implement. First let’s discuss the part that you can control – how you choose to implement. Regardless of the partitioning support offered by your vendor, your RDBMS vendor, not your package vendor, you can choose you own partitioning scheme. Horizontal partitioning is simple splitting a single large entity into several to many smaller entities. The goal and thus the pro is to reduce the size of the entity thus improving performance and manageability. The disadvantage or con is what you divide, you must also reacquaint. Think of the performance and manageability advantages of splitting a one billion row entity containing 165 columns into 3 or 4 one billion row entities, but each containing only just over 40 columns. On the flip side think of the large table joins required to reestablish the original content for those applications and queries that demand such.

What may be just as useful is manual vertical partitioning where you split the entity, respecting its original schema, into several smaller entities organized by some logical criteria such as time. You can take a large table and split it by year, for example. This can be very useful if your application or query criteria limits retrieval by time. It’s also very useful in data warehousing applications where once a partition is finalized, it remains the same for eternity eliminating the need for constant maintenance. For vertical partitioned objects, a Union is required to reestablish the entire content. Both approaches can be combined with very desirable results.

Now the database vendors may have implemented their own partitioning schemes in support of their own unique parallel database implementations. Oracle RAC and DB2 for z/OS and OS/390 have implemented a vertical partitioning scheme splitting data across a user- defined criteria, typically time. This helps the RDBMS provide parallel services particularly parallel I/O and some limited function parallelization. On the other hand, DB2 for Linux, UNIX and Windows; DB2 for iSeries; Microsoft SQL Server; and Informix IDS provide what is classified as horizontal partitioning across shared nothing nodes, not splitting on a fixed value like time but on a hashed value of any appropriate key. This approach leads to more even distribution between nodes or partitions and more opportunity for CPU parallelism through true function shipping. Be careful, because as you can see, what the vendors refer to as horizontal partitioning may not represent the kind of horizontal partitioning that you or your package vendor, i.e., Siebel, implement.

Chuck Kelley’s Answer: Horizontal partitioning is taking a table and breaking certain rows into one area (storage area, storage groups, tablespaces, etc.) and other rows into other areas. I believe that you should have horizontal partitioning by data value (date time key) and not just the luck of database. Not all databases provide horizontal partitioning.

If you have a small table, I wouldn’t partition it, unless it is read quite often and you can’t pin it in memory.

The pros are that you get multiple disk heads (hopefully) working in parallel to get the data, and you can break the data to more manageable pieces. The cons are you have more things to administer and depending on how you disks are set up, there may be no benefit for the added work. There will be some overhead in the database and operating system, but that is generally not a big deal.

The only question is whether Siebel supports you partitioning their tables. Certainly it will work since partitioning is transparent to the application (at least it should be!) and should not cause any trouble with meta data. But I would still ask if they support partitioning?

Clay Rehm’s Answer: Horizontal database partitioning is the act of moving subsets of data to either other tables, databases and/or servers. I don’t have any experience with Siebel but I understand you can access the underlying database within that package and can partition it like other data in your database.

Chuck Kelley is an internationally known expert in database and data warehousing technology. He has 30 years of experience in designing and implementing operational/production systems and data warehouses. Kelley has worked in some facet of the design and implementation phase of more than 50 data warehouses and data marts. He also teaches seminars, co-authored four books on data warehousing and has been published in many trade magazines on database technology, data warehousing and enterprise data strategies. He can be contacted at chuckkelley@usa.net.

Clay Rehm, CCP, PMP, is president of Rehm Technology (www.rehmtech.com), a consulting firm specializing in data integration solutions. Rehm provides hands-on expertise in project management, assessments, methodologies, data modeling, database design, metadata and systems analysis, design and development. He has worked in multiple platforms and his experience spans operational and data warehouse environments. Rehm is a technical book editor and is a co-author of the book, Impossible Data Warehouse Situations with Solutions from the Experts. In addition, he is a Certified Computing Professional (CCP), a certified Project Management Professional (PMP), holds a Bachelors of Science degree in Computer Science and a Masters Degree in Software Engineering from Carroll College. He can be reached at clay.rehm@rehmtech.com.

Scott Howard has been with IBM for more than 22 years. Howard’s experience includes staff and management assignments ranging from microapplications programming to mainframe and systems programming. He is an internationally recognized expert on business intelligence, data warehousing, DRDA, distributed databases and multivendor database integration, and an author and contributor to many publications. Scott is an IBM certified Advanced Technical Expert for DB2 UDB, an IBM Certified Business Intelligence Specialist and Certified Technical Trainer.

Howard is currently with Learning Services, IBM Global Services and is its business intelligence and data integration curricula worldwide leader. He has worked with IBM’s Silicon Valley, Toronto, Rochester and Austin development labs for the past twelve years, developing client/server database and data warehousing courses.

For more information on related topics, visit the following channels:

Advertisement

Advertisement