How to Cheat on Your Performance Exam
Basically, every bit of processing slows down the performance. So we disable as many features as possible of our Threat Defender to minimize the number of operations it performs on the traffic. To this end, we use a policy set and system configuration where the following features are not used: Data Leakage Protection (DLP), Intrusion Detection System (IDS), Threat Intelligence (TI), SSL Interception. We also disable all policy rules and correlation scenarios.
But there are features that cannot be deactivated as we consider them core features needed for all applications of Threat Defender. We want these functions and functionalities to have as little impact on the performance measurements as possible. So we choose our test traffic in a way that these components finish their evaluation as fast as possible. We also optimize the tests for the respective performance indicator we want to measure.
The following measurements were all taken with Threat Defender version 20181206.0 on a single-socket reference system:
Intel Xeon E5-2690v4; 14 cores / 28 threads
128 GB ECC RAM
Network adapters Intel 82599ES and X710 (two times 2x10Gbit/s, 40Gbit/s connectivity in total)
960 GB Enterprise SSD
One important performance indicator is the throughput, i.e. the amount of data processed in a given time. To achieve good throughput values we want to reduce the overhead per flow and the overhead per packet. We use long-lived HTTP connections where we have many packets in a single flow. But the detection of the layer 7 protocol and the extraction of attributes takes place on the very first packets of an HTTP flow as soon as the request and response headers are transferred. For all packets coming afterwards, none of the components in the processing engine are involved so that the packets are forwarded as fast as possible. We also establish only a small number of connections so that the processing cores can handle them easily and distribute the tasks evenly. For this measurement we use the standard packet size of 1518 bytes.
With this setup, we achieve a maximum throughput of 33.4 Gbit/s on the reference system, which is a good value and currently reaches the limit of the input buffers of the NICs.
Optimizing the Packets per Second
We use the same HTTP measurements to determine an optimized number of processed packets per second (PPS). But instead of large packets we use the smallest packet size feasible, down to 1-byte packets. This way we get lots of packets with minimal payload to transfer resulting in a high number of processed packets per second. And since these tiny packets do not contain any content, no packet analysis takes place which saves even more time.
With this test we measure 4,940,000 packets per second. Beyond that, the NICs seem to meet their limits.
Getting Low Latency
We proceed in the same way to achieve low latency values. Of course, to measure latency, some packets need to be transferred. But neither the physical link nor any of the software components should be saturated. So we keep the throughput constantly low to measure the latency. We also tailor the traffic to our software. Threat Defender processes packets in stacks of 16. We therefore run the tests with packet stacks that are multiples of 16 so that the processing unit doesn’t have to wait until the stack is complete. This way we avoid any unnecessary idle times.
Under these optimized conditions we managed to reduce latency to 4.125 μs.
Processing a Multitude of New Sessions per Second
For the first three measurements, we used the same or similar traffic with a low number of flows and a high number of packets per flow to reduce the per-flow overhead. To achieve a large number of new sessions per second, the opposite kind of traffic is needed. For this, we want short flows with a minimal number of packets. The measurement is about the fast instantiation of flows. So to free the memory used to hold the state of the flows, we also want the flows to be closed down properly as fast as possible. To get a reliable measurement, a proper stabilization and ramp-up of the initiated flows is needed. We then require the maximum number of new sessions per second to be stable for at least a minute but have also verified this for up to an hour.
With this process, we measured 410,000 new sessions per second, which took the CPU close to its limit. We may get even higher values for peaks, but that would be cheating ;-)
Summary of the Ideal Test Measurements
In this blog post, we explained how we create “ideal test conditions” and what kind of optimized traffic we use to achieve the following performance values:
Ideal test conditions
Packets per second
New sessions per second (TCP)
As we have shown, these values are optimized. One reason to measure them is to give high values that are fit for marketing purposes. But another reason to optimize the performance values is to be comparable to other vendors’ performance figures under ideal test conditions. And as with any other vendor you may wonder how close that is to reality in your network with the usual non-optimized traffic of your day-to-day network operations.
In our upcoming blog post we will show you how we obtain realistic performance measurements that reflect the use of Threat Defender in practice.