C3.ai Inc.

04/30/2024 | News release | Distributed by Public on 04/30/2024 12:34

How C3 AI’s Model-Driven Architecture Supports a Zero-Downtime Cassandra Upgrade

The C3 AI Platform maintains data confidence and enables a seamless blue/green deployment

By Rohit Kalmankar, Lead Solution Architect, C3 AI

Downtime is an all-too-common result when organizations upgrade digital systems, however there are times when an upgrade is necessary and downtime is unacceptable. To get around this problem, companies employ what is called a blue/green deployment. These types of upgrades offer significant business advantages by ensuring uninterrupted service availability while mitigating risks associated with system upgrades or deployments.

The C3 AI Platform's model-driven architecture, pre-built services, and tools make blue/green deployments seamless and simple to manage. To perform a blue/green deployment, companies maintain two identical production environment: the blue environment represents the current version and the green environment represents the updated version. This approach allows companies to seamlessly switch between these environments, significantly reducing downtime and potential disruptions to ensure uninterrupted customer service.

Moreover, it enables comprehensive testing and validation of the updated environment without impacting the live system, ensuring that any issues can be addressed before customers are affected. Ultimately, blue/green deployments minimize the impact of potential errors, enhance customer satisfaction, and improve the reliability and stability of business operations, which can lead to improved user experience.

This blog provides more detail on how the C3 AI Platform enables these upgrades. We'll walk you through how we upgraded one customer's Cassandra deployment, which is a distributed, wide-column store, NoSQL database system. We focus on considerations related to Cassandra deployments with features that are essential to support AI applications: large data volume and high-velocity data streaming.

Efficiency Redefined: Overcoming Traditional Cassandra Upgrade Challenges

Cassandra's strength is in its distributed database system, which is constructed of rings that are composed of nodes. Typically, when upgrading or expanding a traditional Cassandra ring, you add nodes or upgrade the version, re-stream data to neighbor nodes, run full repairs, and drain the compaction tasks. This is referred to as an in-place Cassandra ring upgrade or expansion. However, this approach has some drawbacks.

First, performance is degraded during the upgrade process. Second, it takes time and requires downtime when Cassandra nodes are in a "joining" state. Third, rollback isn't easy and can cause errors. Fourth, there is no opportunity to test the upgrade thoroughly in production. And finally, it causes downtime and degraded user experience.

Fortunately, C3 AI's model-driven architecture, auto-scaling, configuration management, MapReduce framework, and automation capabilities make it possible to perform blue/green deployment seamlessly.

Benefits of Blue/Green Upgrades

Seamless Validation Using Data Validation Framework
When upgrading Cassandra in a production environment, validating the upgrade can be challenging if done in-place. However, a blue/green approach can eliminate this issue by isolating the blue and green Cassandra rings, allowing us to independently validate them. Additionally, the C3 AI Platform's data validation feature can further enhance the validation process.

The data validation framework allows application subject matter experts to codify their knowledge by configuring and seeding data validation rules. These rules can be executed by application developers to identify any data integration issues that may exist in the data. By establishing such a framework, you can automate the data validation process.

Data Integrity Assurance Using MapReduce and ExpressionEngineFunction
Once the green ring is up and running, the validation process takes center stage. This crucial step involves verifying that the green row counts align with the blue row counts and checking the data quality by calculating the sum or average of the actual values. The validation process is key to maintaining data integrity. If any issues arise during this process, there will be no impact on the blue ring, and we can rebuild the green ring as needed.

The C3 AI Platform offers an array of robust features that ensure data integrity and quality throughout the blue/green deployment process. Its distributed computing capabilities are excellent and allow for seamless task execution across the cluster. Whether you need to create and provision new types or handle tasks like running MapReduce jobs or ad hoc jobs, the platform can do it all with great efficiency.

Moreover, the C3 AI Platform doesn't just include a library of functions, it's a comprehensive library that can be applied to both simple and compound metrics, stored calc expressions, evaluation projections, and fetch filters. This library, housed within the Type ExpressionEngineFunction, offers a wide range of functionalities to ensure data accuracy and consistency throughout the deployment process. You can rely on this extensive library to meet your data needs.

Risk-Free Deployment
The key benefit of a blue/green deployment approach is that, even after successfully validating the green ring, we can swap the rings during deployment. If any issues arise after the swap, we can always roll back to the blue ring at any time. This means minimal disruption as the only time there could be interruptions in service would be the window of time between detecting issues in the green environment and swapping back to the blue ring.

Blue/Green Deployment Methodology

Let's walk through how you can take advantage of C3 AI Platform features to perform a blue/green Cassandra upgrade:

To start, the blue environment is currently serving the production traffic for your application. Meanwhile, the green environment is an identical version of your application that will be used for upgrading or expanding the Cassandra version.

Once the green environment is ready and tested, the green Cassandra ring will replace the blue one. As a result, the production application will access the data from the upgraded green Cassandra ring. Remember that if any issues arise, you can always roll back by switching the Cassandra ring back to the blue environment.

A blue/green deployment with the C3 AI Platform enables zero downtime for any NoSQL or key-value store database, such as Cassandra, shown in the figure above.

Implementation
A crucial element of a successful blue/green deployment is meticulous planning; there are several factors to take note of when planning a blue/green deployment:

  • Application Architecture: Identify all dependencies and configurations needed to execute the deployment properly.
  • Risk and Complexity: Perform a risk assessment that captures for all potential effects if deployment fails.
  • People: Determine the expertise of teams involved and recruit those with additional skills if necessary.
  • Process: Ensure thorough testing and QA, and develop a plan for rollback capacity.
  • Cost: Map a cost to all aspects of the plan, especially if additional resources are needed.

To successfully deploy an application, it's important to understand its architecture, dependencies, and configurations. This will help in developing a deployment plan and identifying and mitigating risks. It is crucial to assemble a team of experts and define clear objectives for each team member. Document every process, including testing and rollback plans. Make sure to consider external dependencies, such as vendors or customers, and communicate the execution plan, additional resource costs, and daily meetings to align the team leading up to the deployment.

A blue/green deployment plan can be divided into two parts: 1) green ring configuration and testing; and 2) blue/green deployment.

Part I: Green Ring Configuration and Testing

Cassandra Backup/Restore
The first step will always be to restore the Cassandra backup to the green ring. The timestamp of the backup restore is critical since you will need it for the next step.

Queue Messages After Backup
To stream data for both environments, we need to create a queue and store messages from the last backup timestamp. The C3 AI Platform provides an easy way to integrate queuing and messaging services like AWS SQS or Apache Kafka. Thanks to its comprehensive connector, you can connect your applications to Kafka clusters without the need for extensive coding. This connector allows smooth data streaming to both blue and green clusters by facilitating interaction between your C3 AI applications and the Apache Kafka messaging system. The CloudMessageSourceSystem framework is also part of the C3 AI Platform, enabling you to consume and ingest messages from various messaging systems like Azure Event Hubs, AWS Kinesis, and Apache Kafka. The best part is, you don't need to write any code. This streamlines the integration process and ensures efficient data flow with minimal effort when connecting and managing your messaging services.

Once the backup is restored to the green environment, data streaming can begin. Depending on the amount of data to be processed, it may take a few hours to catch up with the latest information.

C3 AI Auto-Scaling
The architecture of the C3 AI Platform is both horizontally scalable and elastic, allowing for efficient backlog processing, validation at scale, and automatic scaling based on needs.

Testing
Blue/green deployment is a strategy that enables you to perform testing without affecting the production environment. This testing can include not only functional validation, but also performance validation. For instance, if upgrading the Cassandra ring is critical to meet performance requirements, a blue/green deployment allows you to test this upgrade. Additionally, it is important to validate the quality of data. The C3 AI Platform's ExpressionEngine and MapReduce framework can help you perform quality checks at a large scale instead of focusing on only a small dataset. Moreover, the C3 AI Platform's API enables you to compare both rings in parallel and compare the results. Thorough testing helps you build confidence in your upgraded ring. If you identify any issues, you can always rectify them without affecting the production ring or user experience.

Part II: Blue/Green Deployment

Ring Swap
By this point, you should have confidence in your green ring after verifying it and testing its expected economic benefits. Now, you can take advantage of the C3 AI Platform's model-driven architecture to replace the Cassandra rings with ease, a process that only takes a few minutes. That process is quick and straightforward: you will need to modify the C3 AI server settings to direct traffic to the green ring, and then repeat the same process on the blue ring. The configuration management framework of the platform maintains the current state of the environment, and it offers configuration management methods such as setConfig and getConfig, right out of the box.

Stream Data to Both Rings
After completing the swap, continue streaming data to both rings. This is crucial for downtime management and user experience in a rollback.

User Acceptance Testing
Once the deployment process is successful, you can proceed with user acceptance testing. In case any significant issues are discovered during this phase, you can revert to the blue environment - that is why it is essential to continue to stream to both rings even after a successful initial deployment.

Build Message Backlogs
As you want to stream data to both environments, you must build up the message queue and hold them as they accumulate after the last backup timestamp. After you restore the backup to the green environment, you'll begin to stream data. Depending on the size of the backlog, it can take a few hours to catch up to a live environment.

Auto-Scaling
The C3 AI Platform's architecture is horizontally scalable and elastic. C3 AI auto-scaling helps to process the backlog efficiently, run validation at scale, and lets you scale up and down automatically based on the needs.

Testing
Be sure to complete thorough testing, as detailed in the first part of the blue/green deployment plan.

Ensure System Reliability with the C3 AI Platform

Blue/green deployment comes with certain risks and costs. However, the C3 AI Platform's unique features, including its model-driven architecture, auto-scaling, and pre-built functionalities - such as the ExpressionEngine, MapReduce framework, and configuration management - significantly mitigate those risks. It also enables you to conduct extensive testing, manage downtime, and roll back changes without impacting the production environment. By using a cohesive AI platform, you can implement blue/green deployments at-scale and cost effectively, increasing confidence in upgrades and overall system reliability.