Enterprise IT is on the verge of a revolution, adopting hyper-scale and cloud methodologies including micro-services, DevOps and those developed specifically for cloud platforms (cloud-native).

Many organizations simply try applying the same infrastructure, practices and vendor solutions to the new world, but it may lead to failure. That’s because many solutions and practices –SAN/VSAN and NAS among others – are becoming irrelevant.

In the new revolution, many of the methods employed by cloud vendors are being used for software development in IT organizations:

  • We assume everything can break
  • Services must be elastic
  • Features are constantly added in an agile way
  • There is no notion of downtime


In short, we want Nirvana. Achieving this Nirvana involves using small, stateless, elastic and versioned micro-services deployed in lightweight virtual machines (VMs) or Docker containers.

Need to scale? Simply add more micro-service instances. Need to upgrade? DevOps teams replace the micro-service version on the fly and declare its dependencies. If things break, the overall service is not interrupted. The data and state of the application services are stored in a set of “persistent” services with unique attributes such as atomicity, concurrency, elasticity, and more -- all specifically targeting the new model.

In contrast, the current enterprise IT model has application states stored in virtual disks. This model requires complex and labor-intensive provisioning tools to build, snapshot, and back up data.

Since storage updates are not atomic, we invented “consistent snapshot,” which doesn’t always work. Since the current state doesn’t distinguish between shared OS/application files and data, the overlaps have to be deduped. Today, the storage layer remains unaware of the data semantics, so complex caching solutions are deployed or enterprises opt for expensive all-flash or in-memory solution – all to avoid being bothered with app specific performance tuning.

Managing Data in a Stateless World

With the understanding that everything can and will break, it is important to adopt several key data storage paradigms:

  • All data updates must be atomic and stored in a shared persistency layer. Temporary dirty caches, local logs, partial updates to files, and maintaining local journals in the micro-service cannot be allowed. Micro-services are disposable.
  • Data access must be concurrent (asynchronous). Multiple micro-services can read/update the same data repository in parallel. Updates should be serialized, with no blocking or locking or exclusivity allowed. This allows us to adjust the number of service instances according to demand.
  • The data layer must be elastic and durable so it supports constant data growth or model changes without any disruption to the service. In this way, failures to data nodes should not lead to data loss.
  • Everything must be versioned to detect and avoid inconsistencies.


Enterprise NAS, POSIX semantics and SAN/VSAN solutions do not comply with the above requirements, specifically with atomicity, concurrency, and versioning. That is why hyper-scale cloud vendors do not widely use SAN or NAS internally.

With cloud-native apps Data Services such as object storage, key/value, message queues, and log streams are used to make the different types of data items persistent. In some cases, disk images still exist for storing small stateless application binaries (this is how Docker operates), which would be generated automatically by the build and CI/CD systems and do not require backup.

Data items and files are backed up in the object storage, which have built-in versioning, cloud tiering, and extensible and searchable metadata. There is no need for separate backup tools and processes or complex integrations, and no need to decipher VMDK (virtual disk) image snapshots to get to a specific file version, since data is stored and indexed in its native and most granular form.

Unlike traditional file storage, cloud-native data services have atomic and stateless semantics such as put (for saving an object/record), get (for retrieving an object or record by key and version), list/select (for retrieving many objects or records matching the query statement and relevant version), exec (for executing a database side procedure atomically).

The Table below describes some of the key persistent services by category

Category

Amazon AWS Service Name

Open Source Alternatives

Focus

Object Storage

S3

OpenStack Swift

Store mid–large objects cost effectively, extensible Metadata & versioning, usually slow

NoSQL/NewSQL DB, Key/Value

DynamoDB, Aurora

HBase, Cassandra, MongoDB, Etc.

Store small-mid size objects, data/column awareness, faster

Object Cache
(in memory)

ElastiCache (Redis, Memcached)

Redis, Memcached

Store objects in memory (as shared cache), no/partial durability

Durable Message Queue

Kinesis

Kafka

Store and route message and task objects between services, fast

Log Streams

CloudWatch Logs

Elastic Search (ELK), Solr

Store, map, and query semi-structured log streams

Time Series Streams

CloudWatch Monitoring

Graphite, InfluxDB

Store, compact, and query semi-structured time series data

While the possibility of deploying those persistent services over a SAN or VSAN still remains, it won’t work well since they must be atomic and keep the data, metadata, and state consistent across multiple nodes and implement their own replication. Therefore, using an underlying storage RAID/virtualization is not useful (and in many cases is even more harmful).

The same applies for snapshots/versioning which are handled by those tools at transaction boundaries versus those at non-consistent intervals. In most cases, the tools will simply utilize a series of local drives.

What to expect in the future?

The challenge is that each persistent service manages its own data pool, repeats similar functionality, is tight to local physical drives, and lacks data security, tiering, backups or reduction. One can also observe much overlap between the services, with the majority of the differences being at the trade-off between volume, velocity, and data awareness (variety).

In the future, many of these tools will use shared low-latency, atomic, and concurrent object storage APIs as an alternative (already supported by MongoDB, CouchDB, Redis, Hadoop, Spark, and others). The result will be centralized storage resources and management, disaggregating the services from the physical media, allowing better governance and greater efficiency, and simplifying deployment. All are key for broader enterprise adoption.

Enterprises readying to deploy a micro-services and agile IT architecture should not be tempted to reuse their existing IT practices. Learn how cloud and SaaS vendors do it, and internalize that it may require a complete paradigm shift. Don’t be caught unaware, because some of the brand-new SANs, VSANs, hyper-converged, AFAs, and even scale-out NAS solutions may not play very well in this new world.

(About the author: Yaron Haviv is founder and CTO at Iguaz.io)

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access