OpenKilda is a production-proven SDN controller developed by Telstra to solve the unique challenges of managing a globally distributed network spanning multiple continents. With a score of 71.5% in our comprehensive evaluation, OpenKilda’s distributed architecture and built-in telemetry capabilities make it uniquely suited for networks where latency, geographic distribution, and operational visibility are critical concerns.
This is part of our comprehensive SDN Controller Comparison Guide, where we evaluate 6 leading open-source controllers across 13 technical criteria.
OpenKilda at a Glance
| Attribute | Details |
|---|---|
| Overall Score | 71.5% (Ranked #4) |
| Best For | Global distributed networks, production telemetry at scale |
| Architecture | Distributed, decentralized with Apache Storm |
| Programming Language | Java |
| Primary Protocols | OpenFlow |
| Clustering | Modular (Floodlight + Storm cluster) |
| Community | Small but active (Telstra-developed) |
| Key Differentiator | End-to-end flow telemetry with global-scale distribution |
| Deployment Complexity | High (multiple distributed components) |
| Production Status | Battle-tested in Telstra’s Pacnet infrastructure |
What OpenKilda Does Best:
- Distributed control plane for global networks
- Built-in end-to-end telemetry and monitoring
- Independent horizontal scaling of components
- Production-proven at scale
- Solves latency challenges in geographically distributed networks
The OpenKilda Story: Built for Global Scale
Why OpenKilda Exists:
OpenKilda is a Telstra developed OpenFlow based SDN controller currently being used in production to control the large Pacnet infrastructure. It has been shown to be successful in a distributed production environment.
Designed to solve the problem of implementing a distributed SDN control plane with a network that spans the Globe, OpenKilda solves the problem of latency while providing a scalable SDN control & data-plane and end-to-end flow telemetry.
The Global Network Challenge:
Traditional centralized SDN controllers face critical limitations when managing networks that span continents:
Latency Issues:
- Centralized controller in one region creates latency for distant switches
- Flow setup times increase with geographic distance
- Network convergence slows in distributed topologies
- Real-time decision-making becomes impractical
Scale Bottlenecks:
- Single controller cluster can’t scale indefinitely
- Processing intensive operations (PCE, telemetry) impact control plane
- Coordination overhead increases exponentially with cluster size
Operational Visibility:
- Extracting meaningful telemetry from global infrastructure
- Correlating events across time zones and regions
- Capacity planning for intercontinental links
- Troubleshooting multi-region issues
Telstra’s Solution:
OpenKilda was purpose-built to address these challenges for the Pacnet infrastructure—a global submarine cable network connecting Asia, Australia, and the Americas.
Design Principles:
- Distribute Control: Place controllers near switches to minimize latency
- Decouple Processing: Separate control plane from compute-intensive operations
- Telemetry First: Build comprehensive monitoring into the architecture
- Horizontal Scale: Enable independent scaling of each component
- Production Ready: Design for 24/7 operation from day one
Architecture Deep Dive
The Architecture of OpenKilda is shown in the figure below:
- Structurally, OpenKilda uses the Floodlight software to interact with switches using OpenFlow, but pushes decision making functionality into other parts of the stack.
- Kafka is used as a message bus for the telemetry from the Floodlight and feeds information into an Apache Storm based cluster of agents for processing.
- Storm passes the time-series data to OpenTSDB for storing and analysing.
- Neo4j is a graph analysis and visualisation platform.
Modularity and Extensibility
OpenKilda is built on several well-supported open-source components to implement a decentralised, distributed control plane, backed by a unique, well-designed cluster of agents to drive network updates as required. The modular nature of the architecture lends itself to being reasonably easily added new features.
Scalability
OpenKilda is able to scale process intensive profiling and decision-making functionality horizontally and independently of the control plane.
- OpenKilda approaches cluster scalability in a modular way. While Floodlight is used as a Southbound interface to the switch infrastructure, responsibility for PCE and telemetry processing is pushed northward into a completely separate Apache Storm based cluster. Each Floodlight instance is idempotent, with no requirement to share state. The Apache Storm cluster is by design horizontally scalable and allows throughput to be increased by adding nodes.
Architectural Scalability
- BGP is currently not implemented and may need to be developed.
Interfaces
- Southbound: It supports OpenFlow
- Northbound: Offer RESTful APIs only, which are limited compared to ONOS and ODL
Telemetry
Extracting usable telemetry from the infrastructure was a core design principle of OpenKilda, so one output from the Storm agents is streams of time-series data, collected by a Hadoop backed, OpenTSDB data store. This data can be used in a multitude of ways operationally, from problem management to capacity planning.
Resilience and Fault Tolerance
OpenKilda has no inbuilt clustering mechanism, instead relying on external tools to maintain availability. High availability is achieved by running multiple, identically configured instances, or a single instance controlled by an external framework that detects and restarts failed nodes.
Programming Language
OpenKilda is written in Java.
Community
While the functionality of OpenKilda in its intended space is promising, community support is still being cultivated, leaving much of the development and maintenance burden on its current users, with feature velocity slow. OpenKilda needs your support – chat with us to get involved.






