In a world where most scalable technologies have learned through hard experience that distributing functionality and data processing is the only viable way to handle large footprints, it is difficult to imagine why legacy SDN Controllers such as OpenDaylight would concentrate both control and telemetry processing in small clusters.
In light of the rapid advances in the SDN Controller market over the last 5 years, it is important to take a step back and look at how the landscape has changed. Platforms initially designed for Data Centre centric SD-LAN use cases are regularly being rolled out as solutions in the SD-WAN space, without bringing in complementary tools from fields such Big Data and Distributed Processing to help with issues of scale.
When deployed to manage smaller Enterprise networks or Data Centres, maintaining the relatively few flows needed and handling the resulting stream of status updates over low latency control backplanes can be managed like any other monolithic platform.
Problems begin to present themselves where SD-WAN is implemented over broad areas with less reliable control networks. Many assumptions around network health in a Data Centre are simply not valid when networks are distributed across countries over lower bandwidth, high latency interconnections. Not receiving a status message from a local switch for 2 seconds may indicate a failure in a Data Centre, but indicate nothing if there is congestion across the Atlantic.
Additionally, in a network of many 10s or 100s of switches, the flow of telemetry and subsequent processing quickly overwhelms clusters where a single member is required to parse, interpret and respond to changes in topology and status.
There are also issues caused by the Model-Based Configuration paradigm in OpenDaylight and similar controllers, where having a switch diverge from the in-memory model causes all flows to be dropped and reprogrammed. In our experience this has been known to result in cascading network outages as switches madly play catchup with the controller, creating a feedback loop to be handled, in addition to the initial and subsequent changes.
Fortunately, a relatively new SDN Controller, OpenKilda, has been developed and successfully deployed at scale to solve these problems.
How does OpenKilda solve these problems? Stay tuned – we’ll cover that in the next post next week.