Machine Learning

Monitoring and Anomaly Detection

Typical error detecting and reporting is slow, linear and old-school (but not in a fashionable way). So, you’d be right to be paranoid about your application as problems slip through the cracks. But fear not. Our refined Machine Learning can swiftly detect even the slightest bug before its snuck up on like a windshield. We teach machines to do what isn’t humanly possible. Which is possibly the greatest thing that has ever happened to data. 

Monitoring

Detecting Anomalies within Complex Systems

Monitoring your application can be difficult. If only there were a way to make a machine do all the hard work for you. 

Well, now there is. But it’s not as straightforward as it may seem. You need to determine what algorithm is needed for the problem you are trying to solve, train the system to correctly understand the data, how to react to the data, and even make predictions based off data trends. You also need to know what problems simply cannot be solved by machine learning or when human intervention would be a better alternative. 

With Machine Learning, Aptira can provide better insight into your applications performance, and more highly tuned alerts on anomalies as they occur. 

Aptira Kubernetes Container Orchestration
Quickly Detect Anomolies
Analyze in near real time to raise alerts, initiate automated responses and provide insight
Accurate Alerting
Train the models on real system data to reduce false negatives and positives
Integrate with Support Tools
Ingests data from existing telemetry sources prior to analysis
Aptira Hexagon Icon
Tailored for Individual Systems
Techniques can be applied to any feed of time series telemetry data

Monitoring

Finely Tuned for Modern Applications

Unfortunately, it is common for support team to be inundated with false-positive alerts, causing valuable hours to be spent investigating issues that aren’t a problem. By training Machine Learning models with data from live systems, anomaly detection can be finely tunedreducing the noise and allowing teams to focus where it really counts. 

Additionally, if the platform being monitored can return more detailed status on request, machine learning can be used to categorize flagged anomalies and act automatically where possible. 

Aptira Monitoring Machine Learning
Aptira Managed Cloud: Remote Managed Cloud & Hybrid Cloud

Detection & Prevention

Making Complex Systems Simple

Detecting problems in complex systems can be difficult when there is so much status information constantly streaming in to be analyzed. Often involving tens to hundreds of data points every second, relying on static rules-based systems can be tricky to implement, difficult to debug, and brittle as systems change over time. People are good at many things, unfortunately picking trends from vast amounts of never-ending data is not one of them. 

Monitoring at Scale

Putting Excess Data to Good Use

In today’s world, applications demand high speed, high quality and no downtime. With the sheer volume that many environments are managing – more devices, more traffic, more users and more data – it is an unfortunate reality that they will become prone to errors and performance problems as they grow. While traditional monitoring is helpful, it often can only highlight a problem once it has already occurred. It also requires manual intervention to resolve and can be difficult to scale as your application grows.  

Using Machine Learning, we can identify the root cause of performance and availability problems, automatically respond and fix them in real time. Regardless of the size of your application. By training the models on real system data, Machine Learning based anomaly detection can reduce false negatives and positives being flagged to support teams. 

 This happens to be very useful for complex systems like Software Defined Networks and OpenStack where telemetry is abundant, but often in too great a quantity for easy use, or too complex to write simple threshold-based monitoring rules for. We also understand that every environment is different – so the Machine Learning techniques can be applied to any feed of time series telemetry data. 

Aptira StackBuffet OpenStack Continuous Integration: Use Packages
Aptira GEMINI Workload Migration Engine: What's stopping the migration?

Critical Systems

Early Detection without Complications

At the heart of many modern platforms there exists a message queue or collection of agents responsible for communication between internal components, external systems or end users. In many circumstances, constantly examining the existing telemetry from these critical components can be an efficient way to gauge the overall health of the system. 

By applying ML to these critical components, anomalies and issues can be detected early, avoiding outages or user impact, all without implementing complex hand-crafted rulesets. 

As an example, OpenStack Clouds typically run RabbitMQ as the centralized message bus for communication between the plethora of platform agents. Experience has shown that proper interrogation of the queue state can indicate problems before issues show themselves on compute or controller nodes. By concentrating on the message bus we have the ability to examine the health of all the components without impacting the performance of the compute or controller nodes directly. RabbitMQ:

  • Supports multiple messaging protocols, message queuing, delivery acknowledgement, flexible routing to queues, multiple exchange type.
  • Deploy as clusters for high availability and throughput.
  • Federate across multiple availability zones and regions.
  • Pluggable authentication, authorization, supports TLS and LDAP.
  • Lightweight and easy to deploy in public and private clouds.

Monitoring Tools

ELK, TickStack, Prometheus & ZenOss

Stable IT platforms rely on using the plethora of data from the infrastructure in modern systems to monitor, assess and react to events as efficiently as possible.

From visualisation focussed support platforms like ELK (ElasticSearch, Logstash, Kibana), to more proactive alternatives like TickStack, Prometheus and ZenOSS, we can support the operational toolsets you need to gather, visualize and react to changes in your systems rapidly and with confidence

Aptira: Customised In-House Tool for Consulting & Managed Services
Aptira GEMINI: Workload Migration Engine

Message Queuing

Respond & React

Increased monitoring = increased data. Using tools such as RabbitMQ and Telemetry, we can put this excess data volume to good use. Message queueing allows applications to respond to and automatically react to alerts quicker than what is possible with human intervention. This means less resources, less overheads, and better overall performance.  

The modular nature of Machine Learning allows integrations with operational support systems and messaging applications without interrupting the analysis. Messages can be sent cross languages (such as: Java, .NET, PHP, Python, JavaScript, Ruby, Go and others), platforms, and operating systems. It can also be easily deployed with BOSH, Chef, Docker and Puppet, creating a highly scalable and efficient system. 

Case Study

Real World Insights.

Don’t just take our word for it. Check out our case studies to see how we have designed, deployed and managed Cloud solutions to meet the most demanding applications.

Aptira - OpenStack Cloud Servers Icon

Training

Learn from Instructors with Real World Expertise.

Prefer to do it yourself? Training from Aptira lets you learn about the latest technologies from instructors with real world expertise. Our trainers are our engineers with experience deploying and operating some of the largest and most complex  deployments.

We offer a range of online courses including OpenStack, Docker, Ceph, Puppet, Linux KVM and Linux XEN. We also offer customised and on-site group courses. If your organisation needs to focus on particular technologies, or needs unusual learning outcomes (eg sales/presales enablement, development techniques for cloud native applications) then Aptira can provide you with an unbiased understanding.

View Courses

Learn More about Monitoring & Machine Learning

Blog posts, how-to’s, case-studies, training courses, workshops and more.
When it comes to Monitoring and Machine Learning, our Solutionauts know it all.

Let a Machine do it for you

Monitor your Application better than Humanly Possible