My previous article discussed Getting Started with Database as a Service, an emerging DBMS (database management system) paradigm that delivers on the efficiency of the cloud and provides a scalable framework for managing disparate databases. In the article, I presented an introduction to OpenStack Trove and explained the efficiencies of managing different database types – SQL and NoSQL – while retaining the unique attributes for each. With Database as a Service (DBaaS), routine tasks like provisioning, and administrative tasks like clustering, replication, backup and restore are handled in a simple, unified way.
One of the important principles of the OpenStack approach to DBaaS, called Trove, is that it treats all DBMS technologies in the same manner allowing developers to choose the best database for their application rather than being locked into a particular technology.
This article takes the next step and looks at setting up OpenStack Trove to run the DBaaS in production. I will review some important practices for a private OpenStack setup and how to configure a Trove deployment, including the architecture of an OpenStack system and where the Trove database as a service fits.
Trove as a Client of OpenStack Services
Figure 1 [GR1] illustrates the architecture of an OpenStack system and where Trove fits into this. Trove is a consumer of services provided by other OpenStack services – Neutron (networking), Keystone (identity), Cinder (block storage), Swift (object storage), and Nova (compute).
When a user requests an instance from Trove, this results in a number of requests to these underlying services. For example, to provision a compute instance, it contacts Nova, and to provision block storage, it contacts Cinder.
Tip #1: In practice, having multiple Nova services and reserving a private Nova service for specific clients (like Trove) has several benefits. This private Nova service could provision instances on specialized hardware, and in a specific environment that an end user could not directly access.
Components of Trove
A properly functioning Trove instance has several infrastructure components, including the Trove controller(s), the infrastructure database, and the transport mechanism used by the Trove message bus.
There are three major controller side components – Trove API, Trove task manager, and Trove conductor. Each component of Trove can be scaled independently. Trove, along with the rest of OpenStack, are architected to be loosely coupled systems with the ability to scale individual components based on workload. Another benefit of scaling components is to improve service reliability through redundancy.
Here are recommended practices.
Tip #2: If you think there will be a large number of concurrent connections to the API, then it’s advisable to scale the API service.
Tip #3: The heavy lifting in Trove is done by the task manager, so you should scale that along with the API service.
Tip #4: If you think a large number of instances will be running, then scale the Trove conductor service.
All these individual service components reference the same underlying infrastructure database and message queue.
Trove does not need to share its infrastructure database with the rest of the OpenStack services. This is true for every OpenStack service, which can be configured to have its own private infrastructure database.
Similarly, while most OpenStack services use a message bus of some kind, Trove uses a message bus provided with the Oslo messaging library (oslo.messaging).
Tip #5: There is nothing that requires Trove to share the same message bus infrastructure as the rest of OpenStack and, in fact, it is highly advisable that Trove have its own message bus.
When operated at scale, the individual OpenStack service components may require additional scalability and resiliency so that a deployment may adopt the horizontally-scaled architecture shown in Figure 2[GR2] . The load balancer can be a round-robin DNS (Domain Name System), or a more sophisticated load balancer that routes traffic to each of the individual service instances based on rules. In either situation, the service instances would all share the same message queue and infrastructure database.
Trove uses oslo.messaging as its underlying RPC mechanism. AMQP-based solutions are often used as the underlying transport with RabbitMQ the most common choice.
Tip #6: It’s strongly recommended that the AMQP server and the communications on the message queue are secured through the physical isolation of the networks over which message queue traffic is sent – not just through not just the use of transport-level security (TLS).
The guest instance contains the database server that the user requested, along with the Trove guest agent. In the interest of data security for the data on the server, and for proper operation of the database and the service, it is important to secure the guest and prevent unauthorized activities on it.
Tip #7: It is always advisable to attach a security group that allows traffic only to the designated database port for the specified database(s). For example, on a MySQL instance that would be to restrict access to only port 3306 and the TCP protocol. Also, security groups are configured through Nova Networking or Neutron and should be used whenever possible.
As cloud computing becomes more prevalent, we think database as a service will become a logical progression for the administration and management of databases as an effective way to take advantage of the diverse database types available today.
The OpenStack DBaaS, Trove, is deployed and in production at scale at a number of sites. Users should follow best practices for operating production deployments, which include techniques to properly configure the underlying infrastructure that Trove relies on, such as how Trove and other services can be scaled to handle large deployments with many users and instances.
Also, security of the message queue is very important for enabling transport-level security on the AMQP service. You must secure the guest instance because inadvertent actions by a user who is allowed to connect to a guest instance’s shell could cause a failure of the database and potentially a loss of data and an interruption of service.
You can find a step-by-step guide for installing and getting started with Trove here. The tips that are presented in this article will help you run OpenStack Trove efficiently in your private cloud.
Amrith Kumar is an active technical contributor to – and a member of the core review team for – the OpenStack Trove project, as well as the founder and CTO of Tesora Inc.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access