One part that had evolved a bit into an unmaintainable mess due to a lack of focus was the deployment process of our first product (SmartBooks). Due to taking various shortcuts at the beginning of the company history to gain speed and an additional cloud provider migration, things had become somewhat slow, and the release process complex and brittle. While we learned a great deal and improved many things with the development of our other product (Workflows), we still had to adjust a couple of things for the merger due to the different product architectures (monolith vs. event-driven microservices). This post is about showing you some considerations and technical details of our revised setup.
One goal that we set out to achieve with the platform merger was that we wanted to get rid of the excessive manual testing needed for new releases that slowed us down in the past. The reasons for this were single gate-keepers in the process, tribal knowledge, and an ever-increasing uncertainty about what to even test manually. As we were already doing continuous deployments (incl. canary deployments) for some Workflow services, we knew we wanted to include this as a first-class citizen in the new deployment pipelines. Additionally, to move from our existing continuous delivery flow to actual continuous deployments, we also had to invest in even more extensive testing, most notably contract, and end-to-end testing to get rid of any manual intervention.
Luckily for us, there were a few existing open-source solutions that focus on some of the mentioned aspects. At the time of doing our research these deployment tools included Jenkins X, GitLab Pipelines, Weave Works’s Flux + Flagger and Spinnaker. Ultimately we landed on the latter due to its community, feature completeness, and overall maturity. The corporate backing by Netflix, Google, Microsoft, and Oracle is surely another positive aspect of it (Google now even offers Spinnaker to its cloud customer as Spinnaker for Google Cloud Platform). Other notable organizations using Spinnaker include Airbnb, Cloudera, GetYourGuide, Nest, Waze, and many more.
In the end, we’ve deployed a fully-configured Spinnaker version in one of our Kubernetes clusters. For that, we use a Helm chart and additionally to configure Halyard, Spinnaker’s configuration tool, a Kubernetes ConfigMap. The whole setup is stored in code using Terraform for easy upgrades and configuration changes. Furthermore, we’re running Prometheus within our cluster to gather metrics for canary analysis and Istio to control traffic flow. While Spinnaker provides an excellent integration with Jenkins, we’ve been using CircleCI at CANDIS for quite a while and are overall very happy with it as a continuous integration tool. As such, we’re making extensive use of Spinnaker’s Webhook functionality to trigger jobs/workflows via CircleCI’s API. As our dev, staging, and production environments run in fully isolated AWS accounts according to general best practices, we’ve also had to set up managing accounts for Spinnaker to deploy to these environments.
‘Now, what are the benefits of this approach?’ you might ask. For us, it means that we can finally release changes independently without any form of manual gatekeeping or hand-holding. While the exact number is hard to quantify yet, we expect the average number of releases per day to increase from a single release a day to be more on-demand, multiple times a day. Furthermore, we expect the average lead time, i.e., the time it takes from code commit to running the change in production, to be less than one hour. We believe this will enable us to deliver features and improvements to our users faster and in a less disruptive way and speed up overall development velocity due to a shorter feedback cycle for developers.
While we’re quite happy with our current setup, we’re also expecting to extend our deployment pipelines in the future as we plan to grow into different markets. One feature that attracted us to Spinnaker initially is the option to deploy to additional regions and cloud providers for better redundancy. Another future addition would be adding a load testing stage to the deployment pipeline to get a better idea about potential performance regressions.
If you’d like to join us on the journey of optimizing outdated financial processes and bringing joy to our users with smart and intuitive solutions, have a look at our open roles here.