This final part of our Software Defined Networking (SDN) Controller comparison series includes an in-depth evaluation and product rating for each of the most popular Open Source SDN controllers in industry and academia including: the Open Network Operation System (ONOS), OpenDayLight (ODL), OpenKilda, Ryu and Faucet.
It is important to understand the motivations behind the available platforms. Each design has different use cases as usage depends not only on the capability matrix, but also on the cultural fit of the organisation and the project.
As with most platforms, there are trade-offs to be considered when comparing a centralised, tightly coupled control plane to a decentralised, scalable and loosely coupled alternative SDN controller.
Centralised architectures such as ONOS and ODL tend to be easier to maintain and confer lower latency between the tightly coupled southbound API, PCE and Northbound APIs. However, as the scale increases, centralised controllers can become a bottleneck. In an SD-WAN context this can increase control plane latency but can be mitigated in a distributed architecture.
Distributed architectures such as OpenKilda and Faucet are generally more complex to maintain and deploy but can allow the platform to scale more effectively. By decoupling the processing of PCE, Telemetry and Southbound interface traffic, each function can be scaled independently to avoid performance bottlenecks. Additionally, specialised tools to handle big datasets, time series databases or path computation at scale become viable without adversely impacting southbound protocol performance.
Ryu is different to the other options, although having a core set of programs that are run as a ‘platform’, it is better thought of as a toolbox, with which SDN controller functionality can be built.
Modularity and Extensibility
The modularity of each controller is governed by the design focus and programming languages. Platforms such as ONOS and ODL have built-in mechanisms for connecting code modules, at the expense of centralising processing to each controller. These two Java-based controllers take advantage of OSGi containers for loading bundles at runtime, allowing a very flexible approach to adding functionality.
Python based controllers such as Ryu provide a well-defined API for developers to change the way components are managed and configured.
Adding functionality to Faucet and OpenKilda is achieved through modifying the systems that make use of their northbound interfaces, such as the Apache Storm cluster or equivalent. This provides the added flexibility of using different tools and languages depending on the problem being solved. Additionally, increasing the complexity of northbound interactions does not negatively impact on the SDN directly.
Of the options being considered, only ONOS and ODL contain internal functionality for maintaining a cluster. Each of these platforms is backed by a distributed datastore that shares the current SDN state and allows for controllers to failover in the event of a cluster partition. As new releases of each of the controllers emerge, this functionality looks to be evolving.
OpenKilda approaches cluster scalability in a modular way. While Floodlight is used as a southbound interface to the switch infrastructure, responsibility for PCE and telemetry processing is pushed northward into a completely separate Apache Storm based cluster. Each Floodlight instance is idempotent, with no requirement to share state. The Apache Storm cluster is by design horizontally scalable and allows throughput to be increased by adding nodes.
Both Ryu and Faucet contain no intrinsic clustering capability and require external tools such as Zookeeper to distribute a desired state. With both of these platforms, extra instances of the controller can be started independently as long as the backing configuration remains identical. PCE functionality for these controllers could be pushed down to the instance in the form of modules, or implemented in a similar manner to OpenKilda, backed by a processing cluster of choice.
As the scale of the SDN grows, it becomes untenable for a single localised cluster to handle the load from every switch on the network. Leaving aside geographic distribution of the controllers, breaking the network into smaller logical islands decreases the need for a single southward looking cluster to be massively scalable. With this design, coordination between the islands becomes critical and while a centralised view of the network is still required, the absence of PCE and telemetry processing should not affect data plane stability once flows are configured.
Ryu, Faucet, ODL and ONOS all look to scale in this way by including native BGP routing capabilities to coordinate traffic flows between the SDN islands. Universal PCE and telemetry processing will need to be developed for each of these cases with OpenKilda providing a working reference architecture for achieving this. Due to the state of the documentation for OpenKilda, the BGP will need to be developed.
Considering future compatibility requirements for southbound control, ONOS, ODL and Ryu include protocols beyond just OpenFlow. P4, Netconf and OF-Config could enable additional switch hardware options moving forward should it be required.
The northbound API turns out to be one of the key differentiators between the platforms on offer. ONOS and ODL offer the largest set of northbound interfaces with gRPC and RESTful APIs (among others) available, making them the easiest to integrate. Ryu and OpenKilda offer limited RESTful compared to ONOS and ODL. Faucet takes a completely different approach to applying changes, relying on configuration files to track intended system state instead of instantaneous API calls. This approach will require external tools for dynamically applying configuration but does open the SDN to administration by well-understood CI/CD pipelines and testing apparatus.
One of the primary problems with maintaining an SDN is extracting and using any available telemetry to infer system state and help remediate issues. On this front, ODL lacks functionality, with telemetry still being an experimental module in the latest upstream version. ONOS has modules available to allow telemetry to be used through Grafana or InfluxDB.
Faucet can export telemetry into Influxdb, Prometheus or flat text log files. While Prometheus saves data locally, it can also be federated, allowing centralised event aggregation and processing, while maintaining a local cache to handle upstream processing outages and maintenance.
OpenKilda uses Storm which provides a computation system that can be used for real-time analytics. Storm passes the time-series data to OpenTSDB for storing and analysing. Neo4j, a graph analysis and visualisation platform and provided the PCE functionality initially.
Ryu doesn’t provide any telemetry functionality. This needs to be provide via external tools.
Resilience and Fault Tolerance
The ONOS and ODL platforms implement native clustering as part of their respective offerings. ONOS and ODL provide fault tolerance in the system with an odd number of SDN controllers. In the event of master node failure, a new leader would be selected to take the control of the network. The mechanism of choosing a leader is slightly different in these two controllers, while ONOS focuses on eventually consistent ODL focuses on high availability.
The remaining controllers (OpenKilda, Ryu and Faucet) have no inbuilt clustering mechanism, instead relying on external tools to maintain availability. This simplifies the architecture of the controllers and releases them from the overhead of maintaining distributed databases for state information. High availability is achieved by running multiple, identically configured instances, or a single instance controlled by an external framework that detects and restarts failed nodes.
For Ryu, fault tolerance can be provided by Zookeeper for monitoring the controllers in order to detect controller’s failure and sharding state between cluster members. For Faucet in particular, which is designed to sit in a distributed, shared SDN and be controlled by static configuration files, restarting a controller is a quick, stable exercise that has no reliance on upstream infrastructure once the configuration is written.
ONOS, ODL and OpenKilda are written in Java, for which development resources are abundant in the market, with good supporting documentation and libraries available. While using Java should not be seen as a negative, Java processes can tend to be heavyweight and require resource and configuration management to keep them lean and responsive.
Ryu and Faucet are written in Python, a well-supported language and has an active community developing the framework. The documentation is concise and technical, aimed at developers to maximise the utility of the system. Python is not a fast language and has inherent limitations due to both the dynamic type representations being used and limited multi-threaded capabilities (when compared with Java, Golang or C++).
Both ODL and ONOS benefit from large developer and user communities under the Linux Foundation Networking banner. Many large international players are involved in the development and governance of these projects, which could add to the longevity and security over time. A possible downside is, as with any large project, there are many voices trying to be heard and stability can be impacted by feature velocity. This has occurred with similar projects such as OpenStack in the immediate past.
OpenKilda is a small but active community which can limit the supportability, velocity and features of the platform. OpenKilda needs your support – chat with us to get involved.
Between these two extremes are RYU and Faucet. Both are well supported, targeted controllers. Due to the emerging nature of the field, both options look to have a bright future, with a simpler, streamlined approach to change submission and testing.
Evaluation Scoring Table
Based on the above criteria, we’ve scored each product against each weighted criterion. The results are below:
|Northbound API support||20.0||20.0||20.0||12.0||16.0||8.0|
|Southbound API support||10.0||10.0||10.0||6.0||8.0||8.0|
|Core Components features / services||5.0||4.5||4.5||3.5||2.0||3.5|
|Native Clustering Capabilities||10.0||9.0||7.0||10.0||2.0||5.0|
|Community Size & Partnerships||5.0||4.5||4.5||1.0||4.5||3.5|
|Resilience and Fault Tolerance||5.0||4.0||3.0||4.5||4.0||4.5|
Based on our weighted criteria-based scoring, the evaluation ranks the products as per the below table:
This effort spent investigating the current Software Defined Networking (SDN) Controller platforms can be used to provide insight for users into available Open Source SDN controllers. This might help them to choose the right SDN controller for their platform which match their network design and requirements.