Network testing tools such as netperf can perform latency tests plus throughput tests and more. In netperf, the TCP_RR and UDP_RR (RR=request-response) tests report round-trip latency. With the -o flag, output metrics can be customized to display the exact information.
Here’s an example of using the test-specific -o flag so netperf outputs several latency statistics: Google has lots of practical experience in latency benchmarking and as per blog using-netperf-and-ping-to-measure-network-latency , we tried to create own latency benchmarking before and after migrating workloads to the OCI cloud.
Which tools and why
All the tools in this area do roughly the same thing: measure the round trip time (RTT) of transactions. Ping does this using ICMP packets.
ping <ip.address> -c 100
ping command sends one ICMP packet per second to the specified IP address until it has sent 100 packets.
netperf -H <ip.address> -t TCP_RR -- -o min_latency,max_latency,mean_latency
-H for remote-host and -t for test-name with a test-specific option -o for output-selectors.
When we run latency tests at Google in a cloud environment, our tool of choice is PerfKit Benchmarker (PKB). This open-source tool allows to run benchmarks on various cloud providers while automatically setting up and tearing down the virtual infrastructure required for those benchmarks.
After setting up, PerfkitBenchmarker, its simple to run ping and netperf benchmarks
./pkb.py --benchmarks=ping --cloud=OCI --zone=us-ashburn-1
./pkb.py --benchmarks=netperf --cloud=OCI --zone=us-ashburn-1 --netperf_benchmarks=TCP_RR
These commands run intra-zone latency benchmarks between two machines in a single zone in a single region. Intra-zone benchmarks like this are useful for showing very low latencies, in microseconds, between machines that work together closely.
Latency discrepancies
We've set up two VM.Standard.E4.Flex machines running Ubuntu 22.04 in zone us-ashburn-1, and we'll use Private IP addresses to get the best results.
If we run a ping test with default settings and set the packet count to 100, we get the following results:
ping -c <IP Address>
STDOUT: PING 172.16.60.168 (172.16.60.168) 56(84) bytes of data.
64 bytes from 172.16.60.168: icmp_seq=1 ttl=64 time=0.202 ms
64 bytes from 172.16.60.168: icmp_seq=2 ttl=64 time=0.205 ms
…
64 bytes from 172.16.60.168: icmp_seq=99 ttl=64 time=0.329 ms
64 bytes from 172.16.60.168: icmp_seq=100 ttl=64 time=0.365 ms
--- 172.16.92.253 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 101353ms
rtt min/avg/max/mdev = 0.371/0.450/0.691/0.040 ms
By default, ping sends out one request each second. After 100 packets, the summary reports that we observed an average latency of 0.450 milliseconds, or 451 microseconds.
For comparison, let’s run netperf TCP_RR with default settings for the same amount of packets.
netperf-2.7.0/src/netperf -p {command_port} -j -v2 -t TCP_RR -H 132.145.132.29 -l 60 -- -P ,{data_port} -o THROUGHPUT, THROUGHPUT_UNITS, P50_LATENCY, P90_LATENCY, P99_LATENCY, STDDEV_LATENCY, MIN_LATENCY, MEAN_LATENCY, MAX_LATENCY --num_streams=1 --port_start=20000' --timeout 360
Netperf Results: {'Throughput': '4245.34', 'Throughput Units': 'Trans/s', '50th Percentile Latency Microseconds': '228', '90th Percentile Latency Microseconds': '239', '99th Percentile Latency Microseconds': '372', 'Stddev Latency Microseconds': '92.06', 'Minimum Latency Microseconds': '215', 'Mean Latency Microseconds': '235.08', 'Maximum Latency Microseconds': '21059'
Which test can we trust?
To explain, this is largely an artefact of the different intervals the two tools used by default. Ping uses an interval of 1 transaction per second while netperf issues the next transaction immediately when the previous transaction is complete.
Fortunately, both of these tools allow to manually set the interval time between transactions.
For ping,
-i flag to set the interval, given in seconds or fractions of a second. On Linux systems, this has a granularity of 1 ms, and rounds down.
$ ping <ip.address> -c 100 -i 0.010
For netperf TCP_RR, we can enable some options
--enable-spin flag to compile with fine-grained intervals
-w flag, to set the interval time, and the
-b flag, to set the number of transactions sent per interval.
This approach allows to set intervals with much finer granularity, by spinning in a tight loop until the next interval instead of waiting for a timer; this keeps the cpu fully awake. Of course, this precision comes at the cost of much higher CPU utilization as the CPU is spinning while waiting.
*Note: Alternatively, setting less fine-grained intervals by compiling with the --enable-intervals flag.
Use of the -w and -b options requires building netperf with either the --enable-intervals or --enable-spin flag set.
The tests here are performed with the --enable-spin flag set.
netperf with an interval of 10 milliseconds using:
$ netperf -H <ip.address> -t TCP_RR -w 10ms -b 1 -- -o min_latency,max_latency,mean_latency
Now, after aligning the interval time for both ping and netperf to 10 milliseconds, the effects are apparent:
Ping result is
--- 172.16.92.253 ping statistics ---
1000 packets transmitted, 1000 received, 0% packet loss, time 15981ms
rtt min/avg/max/mdev = 0.252/0.306/0.577/0.025 ms
Netperf results are
Minimum Latency Microseconds,Maximum Latency Microseconds,Mean Latency Microseconds
215,235.08,21059
We have integrated OCI as a provider in Perfkitbenchmarker which we are using to carry out testing.
Here are the results of the inter region ping benchmark for A1.Flex2, E4.Flex.1 and S1.Flex vms.
milliseconds | receiving_region | A1.Flex.2 | ||
sending_region | us-ashburn-1 | us-phoenix-1 | us-sanjose-1 | eu-frankfurt-1 |
us-ashburn-1 | -- | 55 | 81 | 96 |
us-phoenix-1 | 55 | -- | 20 | 147 |
us-sanjose-1 | 76 | 20 | -- | 166 |
eu-frankfurt-1 | 95 | 146 | 165 | -- |
milliseconds | receiving_region | E4.Flex.1 | ||
sending_region | us-ashburn-1 | us-phoenix-1 | us-sanjose-1 | eu-frankfurt-1 |
us-ashburn-1 | -- | 49 | 81 | 98 |
us-phoenix-1 | 49 | -- | 20 | 144 |
us-sanjose-1 | 86 | 20 | -- | 184 |
eu-frankfurt-1 | 96 | 144 | 180 | -- |
milliseconds | receiving_region | S3.Flex.1 | ||
sending_region | us-ashburn-1 | us-phoenix-1 | us-sanjose-1 | eu-frankfurt-1 |
us-ashburn-1 | -- | 55 | 81 | 95 |
us-phoenix-1 | 61 | -- | 20 | 144 |
us-sanjose-1 | 82 | 20 | -- | 175 |
eu-frankfurt-1 | 98 | 149 | 174 | -- |
Tested netperf for intra region, considered us-ashburn-1 region here.
Generally, netperf is recommended over ping for latency tests. This isn't due to any lower reported latency at default settings, though. As a whole, netperf allows greater flexibility with its options and we prefer using TCP over ICMP. TCP is a more common use case and thus tends to be more representative of real-world applications. That being said, the difference between similarly configured runs with these tools is much less across longer path lengths.
Also, remember that interval time and other tool settings should be recorded and reported when performing latency tests, especially at lower latencies, because these intervals make a material difference.
Comments