How to tune applications with DatArcs Optimizer

Seldomly do application developers know exactly how users will use their applications. In order to maximize performance, developers tune their applications according to synthetic loads and use cases.

Some applications have dynamic tuning capabilities, such as in databases, where the application behavior can change in response to the applied load. While this technique is beneficial, it is time-consuming to develop and may not capture all the important use-cases.

Starting with version 0.10 of DatArcs Optimizer, application developers can perform dynamic tuning automatically.

In this post, we’ll show how this can be done using an example application written in bash.

Tuning a simple application – one knob and one metric

Below is an example application that has one knob that controls its behavior and one metric that describes its performance. The knob integer value is communicated via a file named “knob”, and the metric integer value is communicated via a file named “metric”.

#!/bin/bash
while true ; do
        knob=`cat knob`
        metric=$(( 50 - (knob-7) * (knob-7) ))
        echo "$metric" | tee metric.tmp
        mv metric.tmp metric
        sleep 0.2
done

Passing knob and metric values via files as we did above is not recommended unless proper lock mechanisms are used. We did it here only to keep the example simple.

We’ll now write a plugin to DatArcs Optimizer that will sample “metric” and feed it into Optimizer as a metric with the name: “example_app.my_target_metric”. To do this, we’ll write a new C++ class that inherits from MetricsPlugin defined in metricsPlugin.h provided with Optimizer. The constructor defines the metrics, and sample_system() samples the metrics and feeds them to Optimizer.

#include <fstream>
#include "metricsPlugin.h"

class examplePlugin : public MetricsPlugin {
public:
        examplePlugin();
        ~examplePlugin() {};
        void sample_system(MetricsPluginData *current_values);
        std::string get_name() {return "example_app";};
private:
        MetricsPlugin::Metric m_target_metric;
};

examplePlugin::examplePlugin() {
        m_target_metric.name="my_target_metric";
        m_target_metric.aggregated=false;
}

void examplePlugin::sample_system(MetricsPluginData *current_values) {
        std::ifstream f("metric");
        int metric_val;
        f >> metric_val;
        current_values->insert(m_target_metric,metric_val);
}

extern "C" MetricsPlugin* create_object() {
        return new examplePlugin;
}

extern "C" void destroy_object( MetricsPlugin* object ) {
        delete (examplePlugin *) object;
}

We then compile the plugin as follows:

$ gcc -c -fpic example_app_plugin.cpp
$ gcc -shared -o libexampleplugin.so example_app_plugin.o

Putting everything together via the knobs.yaml configuration file:

domain:
  common:
    knobs:
      my_knob:
        options: [0,4,8,12]
        get_script: "cat knob"
        set_script: "echo $value > knob"
    metrics:
      include: [example_app.*]
      target: example_app.my_target_metric
      plugins: [libexampleplugin.so]

We’ll start with a knob value of 0:

$ echo 0 > knob

We’re now ready to start tuning:

$ datarcs-optimizer --knobs=no-embedded --knobs=knobs.yaml

Optimizer will now attempt to find the optimal knob value from the possible options 0,4,8,12. The target metric as a function of time is plotted below:

It took Optimizer around 90 seconds to properly tune our simple application and give it a 49x boost in its target metric.

Applications with phases

We’ve shown above a very simplistic example of an application. In reality, applications have phases, and in each phase the behavior of the application may be different. We’ll add phases to our simple bash application:

#!/bin/bash
phase=0
counter=0
while true ; do
        ((counter++))
        if [ $counter -ge 150 ] ; then
                phase=$(( (phase + 1) % 2 ))
                counter=0
        fi
        max=$((5+phase*2))
        knob=`cat knob`
        metric=$(( 50 - (knob-max) * (knob-max) ))
        echo -e "${metric}\n${phase}" | tee metric.tmp
        mv metric.tmp metric
        sleep 0.2
done

In the revised application, we have two phases, as follows:

Phase Target metric (x = knob value)
0 50-(x-5)^2
1 50-(x-7)^2

The revised application will spend 30 seconds in each phase, and then switch. In each phase, the target metric behaves differently, depending on the knob setting. This behavior resembles real applications with different phases of execution. In order to feed the new metric to Optimizer, we’ll alter the plugin:

class examplePlugin : public MetricsPlugin {
public:
        examplePlugin();
        ~examplePlugin() {};
        void sample_system(MetricsPluginData *current_values);
        std::string get_name() {return "example_app";};
private:
        MetricsPlugin::Metric m_target_metric, m_phase_metric;
};

examplePlugin::examplePlugin() {
        m_target_metric.name="my_target_metric";
        m_target_metric.aggregated=false;
        m_phase_metric.name="phase";
        m_phase_metric.aggregated=false;
}

void examplePlugin::sample_system(MetricsPluginData *current_values) {
        std::ifstream f("metric");
        int metric_val;
        f >> metric_val;
        current_values->insert(m_target_metric,metric_val);
        f >> metric_val;
        current_values->insert(m_phase_metric,metric_val);
}

We’ll now tune the application again:

In the first 3 minutes, Optimizer learned about the workload, which switched phases every 30 seconds. When enough data was gathered, Optimizer attempted to tune each phase separately. Approximately 6 minutes into the run, Optimizer found the best knob setting for each of the two phases, and switched to the correct setting after each phase change.

Optimizer can work with hundreds of application metrics in order to determine the phase of the application and to tune for each specific phase. Moreover, new knobs can be easily added in knobs.yaml.

The big picture

Application tuning is a complex task, involving many moving parts and use of advanced algorithms. DatArcs Optimizer can be used to automate application tuning, freeing application developers to focus on the business logic.

Application tuning can be employed in conjunction with Optimizer’s CPU and OS tuning features for even greater gains.

Stay tuned for announcements of support for specific applications in the coming months!

DatArcs Turns Up Performance Heat for Networking Giant Mellanox

When we think about top datacenter network performance, no name comes to mind faster than Mellanox—the undisputed leader in HPC and large-scale datacenter connectivity.

With so much sophistication built into its network cards to help end users realize blazing fast speed and reliability, Mellanox customers want to take the fastest path to optimization. As we know, fine-tuning system components takes time and expertise, and an advanced Mellanox network card is no exception.

This is why a technical partnership between Mellanox and DatArcs—a fast-growing provider of automatic tuning and optimization software for a wide range of datacenter applications—makes practical, time-saving, performance-boosting sense. For Mellanox, as well as for the company’s diverse range of customer applications that require the best application performance: lowest latency, bandwidth maximization and CPU demand optimization.

Under the collaboration, the DatArcs development team will take its own expertise in creating auto-tuning and optimization software for CPU performance and adapt that to Mellanox networking gear. This means that instead of sending legions of performance engineers to help users get the most out of their applications by manually tuning the many available knobs and dials, Mellanox can get their end users up and running at the top of their game—out of the gate, and without long hours spent looping through options and optimizations.

“This partnership will allow Mellanox to ship highly-tuned products with lower effort. With their large number of customers with an incredible range of applications that require high performance network performance, we think this is a win-win for both Mellanox and their user base,” says Dr. Tomer Morad, CEO of DatArcs.

“Given the wide range of applications our customers are running and Mellanox’s highly configurable and flexible networking gear, using the automatic tuning tool DatArcs Optimizer to reduce or even eliminate manual tuning time makes a lot of sense.”, says Dror Goldenberg, VP of Software Architecture at Mellanox, “With DatArcs technology we can enable optimal tuning to different networking workloads varying from HPC to cloud, storage, NFV and artificial intelligence.”

DatArcs Optimizer is automatic and tunes without any user input; it is adaptive in its ability to detect the phase of the application and workload characteristics; it is extensible, which allows Mellanox and its end users to add their own tunable parameters and knobs based on application needs and perhaps most important, it is flexible. This means Optimizer can tune for speed, energy, power consumption—and now on Mellanox network cards, for maximum network throughput, performance, and efficiency.

The two companies have been answering questions and presenting their automatic optimization work across the country as well as listening to user stories about the long wait times to get to final optimal configurations manually.

Ultimately, the partnership means ditching the manual tuning and freeing up those network and datacenter experts to focus more on the systems and codes that keep their organizations on top—and in front of the competition.

Why Leading Bare Metal Cloud Provider Packet Partnered with DatArcs to Help Users Maximize Hardware Infrastructure

Despite all the abstractions, servers are complex. Add to that raw hardware complexity the vast multitude of possibilities to manually tune and adapt to even just one singular application, and that complexity increases by orders of magnitude.

For some, manual tuning is an art and a science, but even for the rare few who take great pleasure in endless knob-tuning, it is ultimately a massive drain on time and energy. Even with expert tuning that accounts for all the vagaries of the various CPU, operating system, framework, application, and other parameters, we are all just humans after all—and there are limits to what we can see.

This is the case for small node-count deployments, but imagine tuning across much larger fleets of machines, each of which has been acquired at different times, requiring a different knob rotation for every performance aspect.

The tunability spins quickly out of control, leaving progressively larger gaps in performance and efficiency—and eating into costs. Human insight is at its limits, but as customers like Packet (a bare metal cloud based in NYC) are quickly realizing, there is huge value in enabling end users to maximize performance without tedious manual tuning.

Extracting More Value from Hardware
As a provider of high performance cloud resources, Packet is constantly looking for new ways to deliver more value from the physical servers provisioned on its platform. No matter what applications or technologies users end up putting on top of their Packet servers, extracting extra performance and efficiency from the hardware was an important and missing part of their story.

They understand that to stand out as a unique provider in a crowded – and mainly virtualized – cloud industry, Packet needs to invest in tools like automatic tuning and any “easy” resource that helps customers extract more value from their infrastructure. This constant desire for value / optimization is what brought them to DatArcs.

Cloud Portability Makes Manual Tuning Impractical
While Packet’s Dell, SuperMicro, Quanta and Foxconn servers are designed with performance characteristics on a broad spectrum of applications, at the end of the day, Packet’s nearly 8,000 users take that commodity hardware and do all kinds of things with it.

From databases and serverless functions, to big data and storage, the use cases are endless. Additionally, due to Packet’s focus on automation and “cloud native” workloads, the environments are constantly being refreshed as users swap out servers, add new hardware, or auto-scale to handle peaks and valleys.

This makes the long, slow process of manual hardware tuning impractical, requiring a move beyond human level adjustments toward software-driven optimization that can see the application, recognize its needs relative to the hardware, quickly learn its ins and outs, and implement the ideal setup for that environment and workload automatically.

How the Magic Happens – And Why it Needs to Feel Like Magic!
These are all high-level features of DatArcs, but under the hood, a complex set of learning algorithms snaps together a complete profile of the application and sets it on its best path to energy efficient performance.

And with no manual tuning for either Packet or its customers and at a low cost for users at provisioning time for pennies on the hour, the choice was clear—dynamic, adaptive, automatic tuning without the headache or budget hit.

DatArcs understands that Packet is just one of many companies, researchers, or enterprise data analysts that need results as fast as possible and within a power-aware envelope. They also know that time-constrained pros have no time to waste endlessly changing knobs, only to find that one subtle change can cascade into a whole new wave of needed post-tuning.

What smart developers, engineers, scientists, and data-driven analysts need is a tool that outsmarts even the smartest system administrator with a deep inside view into the system, application, and parameters for adaptive tuning.