A leading international Software Defined Networks (SDN) and Data Analytics Provider would like to upgrade their applications to utilise OpenStack and Container Orchestration, but were running into complications and needed a bit of extra help.
This customer had attempted to deploy OpenStack on their own several times but had run into complications. OpenStack is not their expertise and their design was based on TripleO – which is quite complicated to deploy and operate. They required help with the platform design and configuration so they can containerise their applications. As they are located overseas, our Solutionauts were operating across time-zones and completing all work remotely.
The Aptira Solution
Aptira designed a containerised OpenStack solution utilising Ceph as the backend storage for image, volume and compute. We then used kolla-ansible to deploy Ceph and OpenStack. We chose this configuration because it’s relatively simple to use kolla-ansible to customise configurations and to change the deployment, making ongoing configurations easier on the customer once the project has been handed over.
The Ceph cluster had 4 replicas, with Ceph mons/mgr running on 3 rack servers, while the object storage devices (OSDs) are running across 8 blade servers in two chassis. There are three regions to host their apps in different failure domains for redundancy. The OpenStack controllers were converged with Ceph mons on rack servers, and Compute collocated with OSDs on blade servers.
We successfully resolved a number of issues that arose during the implementation. One issue we faced was a memory leak bug in the OVS code which had not yet been fixed in upstream. As a temporary workaround, we were required to restart the neutron agent services regularly to release memory until the bug has been fixed upstream. In order to speed up this process and remove manual intervention, we setup a cron job which will restart neutron agent services on out of business hours.
Another separate challenge was found in the default haproxy maxconn which was not large enough, resulting in instability. To resolve this, we increased the haproxy maxconn value in the haproxy config file, improving the stability of the platform.
We delivered the Ceph-backed OpenStack Cloud on which their applications are now deployed. The configuration has passed their HA tests and is being used in production.
It is important to note that we deployed OpenStack Rocky which was the latest stable OpenStack release at the time of this project. Unfortunately, Kolla-ansible is unable to complete upgrades and downgrades on this version, so future work will be required in order to simplify the upgrade/downgrade process. Stay tuned!