Last month we looked at the business drivers that have fueled increased concern about information assets and underlying data, applications and computing-system availability. We defined managed availability - the ability of an organization to deliver consistent, predictable access to information for any user wherever, whenever and however they need it -- and introduced the concept of "progressive business requirements," reactive through proactive, by which an organization may chart its degree of data sensitivity versus application sensitivity. Finally, we introduced the "availability continuum" and examined disaster recovery, improved availability, high availability and continuous availability, contrasting and clarifying these terms through the lens of user or organizational priorities.
As we begin to apply these concepts to business issues, we need to understand that any managed availability game plan is determined by the priorities of the business as defined by the system users. Can they feel secure that critical data will not be lost, even if its recovery takes half a business day - or more? Or, in the event of a system failure, how quickly must they have access to applications before downtime cuts deeply into business operations, revenues and profitability? Such priorities, shaped by the requirements of the business, are fundamental to any solution choice.
There are two specific measures that set the stage for discussing these issues: recovery point objective (RPO) and recovery time objective (RTO). These are not new terms and have been around since availability providers first started trying to quantify some of the more abstract principles of availability theory. But now, numerous availability solutions have come to the market. Plus, more than ever, businesses are challenged to precisely match requirements and specific objectives to a solution that is a good fit for the organization. Consequently, it is incumbent on system managers and business executives to revisit and understand RPO and RTO in a new light.
Let's put a stake in the ground and define these terms:
Recovery point objective is the measurable target to recover data up to a specific point in a transaction stream. It can be measured in time or some actual quantity and expresses the amount of data an organization may tolerate to lose.
Recovery time objective is the measurable target to recover application functionality for the purpose of operational continuity. It is most often measured in specific time and expresses the amount of time a business may tolerate the computing system (hardware, software, services) to be offline.
A lot of system managers and business executives worry most about losing data. They know applications are important for operational continuity, but data integrity takes precedence over everything. They are data priority people, classified as reactive to business-continuity requirements and chiefly focused on RPO.
Let's look at the local bank down the block. The bank's executives know that the applications providing online account management offer customers many conveniences. Nonetheless, the executives are probably more concerned about data integrity. Lost data would impact greatly on customers' accounts, creating havoc in rebuilding transaction paths that may only estimate balances. Long-term customer satisfaction and loyalty would very likely be damaged. If the ATM is offline for an hour or so, it is an inconvenience; if the bank loses your payroll deposit, it is a catastrophe!
Many organizations, maybe even our hypothetical bank, depend on tape backups to protect data. But backups, even if taken at multiple points throughout a business day, provide no protection to recover data lost between saves. This lost data between saves is referred to as "orphan data" and can often be a source of great concern.
In Figures 1 and 2, we see the relationship between RPO and potential orphan data in different computing environments. Figure 1 shows a typical one-server business dependent upon nightly tape backups, performed when the operation's transactional rate has slowed for the day. This backup regimen is probably a daily practice, having some de facto RPO of "time between each save." Likely, this business provides some means for a system rebuild and data recovery in the face of an abrupt server failure. However, in the event of a failure, all transactions between nightly saves are subject to loss, leading to extended recovery and costs associated with rebuilding lost transactions.
Figure 2 shows a source/backup server environment employing technology that facilitates snapshot replication of transactions from the source to the backup at lower-activity intervals. These RPO intervals may be predetermined by a system manager based on transactional history, or the replication software may heuristically declare the RPOs through analytical logic. In this scenario, the potential for orphan data has been reduced into smaller risk windows, and therefore the recovery time and costs would be mitigated.
On the other side of the coin, some business executives and system managers acknowledge the importance of data but assume its integrity. They're concerned about applications downtime, because operational continuity is vital to the health of their enterprises. They are applications priority people, more proactive to business-continuity requirements and chiefly focused on RTO.
Take, for example, a U.S.-based electronics parts manufacturer that has won business in European and South American markets. It has worked hard to build an enterprise that networks customers, sales reps and the supply chain into browser-based e-business architecture. Sure, data is important to the enterprise, but round-the-clock manufacturing and shipping, including considerable automation, mean company officials worry most about maintaining operational continuity and having uninterrupted access to related applications. Any shutdown would impact on fulfilling the needs of valued international customers.
Or consider a dairy that processes milk from regional farmers. The milk arrives early each morning and must be processed within a matter of hours because it is fresh and has a limited shelf life before spoilage. Although the dairy is not a 24x7 enterprise, the availability of automated processes and applications is critical during a relatively short window of time. Any downtime could lead to lost inventory, reduced sales and unhappy consumers who expect delivery of product to their local supermarkets. The long-term impact could be significant.
The specific RTO for each business becomes more obvious when you pinpoint when or how long your business can afford to be down. Big enterprises, bonded by ERP solutions, will likely have an RTO of near zero. A smaller company whose operational plan and revenue are dependent on seamless continuity during specific periods may target a low RTO as well. But a smaller company may target an RTO that is higher than that of a larger enterprise because the relative cost for the smaller company to achieve a low RTO is greater than the relative cost for the larger enterprise to achieve the same low RTO. Interestingly enough, a leading analyst group has reported that 75 percent of all small companies that experience an outage of any significance never recover and are out of business within a year.
There are a number of availability solutions designed to help organizations manage RPO and RTO. We will discuss the pros and cons of most of them in future articles. But when we begin to view managed availability less in abstract terms and more as a concrete business discipline balancing requirements, cost of protection, downtime risk and the possibilities for business growth, we find RPO and RTO tend to synthesize into one common objective. We also find that data priority and applications priority are more mutually inclusive than mutually exclusive, and that business requirements, risk management and ROI drive the selection of the ultimate solution.
In our next Managed Availability Memo, we will begin to discuss "downtime," both planned and unplanned. We will pinpoint the causes of downtime and begin to assess its far-reaching financial impact, setting the stage to understand all the factors that impact your business, and providing step-by-step guidelines for estimating your organization's annual cost of downtime.
DM Review Online readers who wish to study Managed Availability issues and technology in greater depth may subscribe to Vision Solution's Business Continuity Series at http://www.visionsolutions.com/BCSS.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access