Additional latency approach¶
A quick overview of what ocp-network-split does to introduce latency among nodes of different cluster zones.
Network latency script¶
Latency between nodes from different zones is introduced by setting up netem qdisc egress traffic queue on each node of the cluster so that packets targeted to nodes in other zones flows through a netem qdisc which introduces given delay. This means that for latency to be introduced for incoming packets as well, it’s necessary to setup netem introduced latency on all nodes of the cluster(s) and the total RTT (round-trip time) added this way equals two times the specified delay.
This setup is implemented in network-latency.sh
script, which consumes
requested delay latency in miliseconds (half of RTT) as a command line
argument. Zone configuration and detection are handled in the same way as in
network-latency.sh
script (in fact, both scripts share this part).
One can see what changes will be introduced via -d
option, which
makes the script report what it would do instead of performing the setup:
$ export ZONE_A="198.51.100.199"
$ export ZONE_B="198.51.100.109 198.51.100.96 198.51.100.97 198.51.100.99"
$ export ZONE_C="198.51.100.103 198.51.100.84 198.51.100.87 198.51.100.98"
$ ./network-latency.sh -d 15
ZONE_A="198.51.100.199"
ZONE_B="198.51.100.109 198.51.100.96 198.51.100.97 198.51.100.99"
ZONE_C="198.51.100.103 198.51.100.84 198.51.100.87 198.51.100.98"
current zone: ZONE_B
network interface: ens192
tc qdisc del dev ens192 root
tc qdisc add dev ens192 root handle 1: prio bands 4
tc qdisc add dev ens192 parent 1:4 handle 40: netem delay 15ms
tc filter add dev ens192 parent 1: protocol ip prio 1 u32 match ip dst 198.51.100.199/32 flowid 1:4
tc filter add dev ens192 parent 1: protocol ip prio 1 u32 match ip dst 198.51.100.103/32 flowid 1:4
tc filter add dev ens192 parent 1: protocol ip prio 1 u32 match ip dst 198.51.100.84/32 flowid 1:4
tc filter add dev ens192 parent 1: protocol ip prio 1 u32 match ip dst 198.51.100.87/32 flowid 1:4
tc filter add dev ens192 parent 1: protocol ip prio 1 u32 match ip dst 198.51.100.98/32 flowid 1:4
tc qdisc show dev ens192
tc class show dev ens192
It’s also possible to specify specific latencies between particular zones, eg.
command network-latency.sh -l ab=25 -l ac=35 5
will setup 25 ms (50ms RTT)
latency between zones a
and b
, 35ms between zones a
and c
, and
5ms (10ms RTT) between rest of the zones (which in this particular case means
between b
and c
).
$ ./network-latency.sh -d -l ab=25 -l ac=35 5
ZONE_A="198.51.100.199"
ZONE_B="198.51.100.109 198.51.100.96 198.51.100.97 198.51.100.99"
ZONE_C="198.51.100.103 198.51.100.84 198.51.100.87 198.51.100.98"
current zone: ZONE_B
network interface: ens192
tc qdisc del dev ens192 root
tc qdisc add dev ens192 root handle 1: prio bands 6
tc qdisc add dev ens192 parent 1:4 handle 40: netem delay 5ms
tc qdisc add dev ens192 parent 1:6 handle 60: netem delay 35ms
tc qdisc add dev ens192 parent 1:5 handle 50: netem delay 25ms
tc filter add dev ens192 parent 1: protocol ip prio 1 u32 match ip dst 198.51.100.199/32 flowid 1:5
tc filter add dev ens192 parent 1: protocol ip prio 1 u32 match ip dst 198.51.100.103/32 flowid 1:4
tc filter add dev ens192 parent 1: protocol ip prio 1 u32 match ip dst 198.51.100.84/32 flowid 1:4
tc filter add dev ens192 parent 1: protocol ip prio 1 u32 match ip dst 198.51.100.87/32 flowid 1:4
tc filter add dev ens192 parent 1: protocol ip prio 1 u32 match ip dst 198.51.100.98/32 flowid 1:4
tc qdisc show dev ens192
tc class show dev ens192
As you can see, the script removes existing root qdisc and creates new traffic queues filtering packets for particular zones to qdiscs with netem introduced latency. This is obviously not optimal from production perspective, but it’s a good trade-off for testing purposes.
The script can remove the extra latency via it’s teardown command:
network-latency.sh teardown
.
But note that the script does it by removing the root qdisc relying on the
fact that the default qdisc will be recreated. The script doesn’t provide
ability to revert to the original traffic queue configuration applied before
the latency was set (as noted above, the original configuration gets deleted).
See also:
Description of PRIO qdisc
Description of netem qdisc network delay and loss emulator
Systemd Unit¶
The latency script described above is not used directly, but via systemd
network-latency.service
unit. Starting the service configures the latency,
while stopping the service removes the latency setup (via the teardown
command as described above). This means that checking status of this service on
given node reveals whether the additional latency is currently in effect.
When deployed via MachineConfig or Ansible Playbook as explained below, the
latency service is started during boot.
[root@example-0 ~]# systemctl status network-latency
● network-latency.service - Linux Traffic Control enforced network latency setup
Loaded: loaded (/etc/systemd/system/network-latency.service; enabled; vendor preset: disabled)
Active: active (exited) since Fri 2023-02-03 15:31:54 UTC; 17s ago
Process: 20864 ExecStop=/usr/bin/bash -c /etc/network-latency.sh teardown (code=exited, status=0/SUCCESS)
Process: 20882 ExecStart=/usr/bin/bash -c /etc/network-latency.sh -l ab=11 -l ac=7 5 (code=exited, status=0/SUCCESS)
Main PID: 20882 (code=exited, status=0/SUCCESS)
Feb 03 15:31:54 osd-0 bash[20917]: qdisc netem 60: parent 1:6 limit 1000 delay 11ms
Feb 03 15:31:54 osd-0 bash[20917]: qdisc netem 40: parent 1:4 limit 1000 delay 5ms
Feb 03 15:31:54 osd-0 bash[20917]: qdisc netem 50: parent 1:5 limit 1000 delay 7ms
Feb 03 15:31:54 osd-0 bash[20918]: class prio 1:1 parent 1:
Feb 03 15:31:54 osd-0 bash[20918]: class prio 1:2 parent 1:
Feb 03 15:31:54 osd-0 bash[20918]: class prio 1:3 parent 1:
Feb 03 15:31:54 osd-0 bash[20918]: class prio 1:4 parent 1: leaf 40:
Feb 03 15:31:54 osd-0 bash[20918]: class prio 1:5 parent 1: leaf 50:
Feb 03 15:31:54 osd-0 bash[20918]: class prio 1:6 parent 1: leaf 60:
Feb 03 15:31:54 osd-0 systemd[1]: Started Linux Traffic Control enforced network latency setup.
MachineConfig¶
MachineConfig resource is used to deploy both the script and systemd service unit file on each node of OpenShift cluster.
Using openshift interface has an advantage of better visibility of such changes, which can be easily inspected via machine config operator (MCO) API. Moreover the latency setup would survive a node reboot (assuming ip address of the node don’t change).
Both ocp-network-split-setup
(single cluster mode) and
ocp-network-split-multisetup
tools which generates MachineConfig resources
can include latency setup there when latency configuration is specified via
--latency
and --latency-spec
options.
Example of passing latency values to ocp-network-split-multisetup
tool:
$ ocp-network-split-multisetup zone.ini --mc example.mc.yaml --env example.env --latency 5 --latency-spec ab=50 ac=50
Ansible Playbook¶
In multi cluster mode ansible playbook multisetup-latency.yml
is used
to deploy the latency script and systemd service to RHEL machines which are
part of a zone but outside of any OpenShift cluster. The playbook receives
the latency values via the following variables:
Variable name |
Meaning |
Example |
---|---|---|
|
default latency between zones |
|
|
dictionary with zone spec latency |
|
Example of passing the values via --extra-vars
:
$ ansible-playbook -i ceph.hosts --extra-vars '{"latency":"5","latency_spec":{"ab":"50","ac":"50"}}' multisetup-latency.yml
If multi cluster zones contain both OpenShift nodes and classic RHEL machines outside of any OpenShift cluster, one needs to use both MachineConfig and ansible playbook setup so that the latency service is deployed and running on all nodes of all zones.
Single Cluster Example¶
This example assumes we deployed network latency MachineConfig, and the OpenShift cluster have already applied the configuration on all it’s nodes.
For demonstration purposes, we connect to some cluster node via oc
debug
and check status of network-latency
service there:
sh-4.4# systemctl status network-latency
● network-latency.service - Linux Traffic Control enforced network latency setup
Loaded: loaded (/etc/systemd/system/network-latency.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Tue 2021-09-28 00:32:15 UTC; 4min 59s ago
Process: 1614 ExecStart=/usr/bin/bash -c /etc/network-latency.sh 106 (code=exited, status=0/SUCCESS)
Main PID: 1614 (code=exited, status=0/SUCCESS)
CPU: 46ms
Sep 28 00:32:15 compute-5 systemd[1]: Starting Linux Traffic Control enforced network latency setup...
Sep 28 00:32:15 compute-5 bash[1614]: ZONE_A="198.51.100.94"
Sep 28 00:32:15 compute-5 bash[1614]: ZONE_B="198.51.100.109 198.51.100.96 198.51.100.97 198.51.100.99"
Sep 28 00:32:15 compute-5 bash[1614]: ZONE_C="198.51.100.103 198.51.100.84 198.51.100.87 198.51.100.98"
Sep 28 00:32:15 compute-5 bash[1614]: current zone: ZONE_C
Sep 28 00:32:15 compute-5 bash[1614]: Error: Cannot delete qdisc with handle of zero.
Sep 28 00:32:15 compute-5 systemd[1]: network-latency.service: Succeeded.
Sep 28 00:32:15 compute-5 systemd[1]: Started Linux Traffic Control enforced network latency setup.
Sep 28 00:32:15 compute-5 systemd[1]: network-latency.service: Consumed 46ms CPU time
There we can see that the delay introduced is 106 ms, we see the zone configuration, detected zone of the node, and that the setup succeeded. Now when we try to ping some node from zone A or B, we will observe that RTT is two times the delay, 212 ms:
sh-4.4# ping 198.51.100.96
PING 198.51.100.96 (198.51.100.96) 56(84) bytes of data.
64 bytes from 198.51.100.96: icmp_seq=1 ttl=64 time=212 ms
64 bytes from 198.51.100.96: icmp_seq=2 ttl=64 time=212 ms
64 bytes from 198.51.100.96: icmp_seq=3 ttl=64 time=212 ms
64 bytes from 198.51.100.96: icmp_seq=4 ttl=64 time=212 ms
^C
--- 198.51.100.96 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 212.292/212.326/212.347/0.564 ms
But when we try to ping a node from the same zone C, we see that there is no additional delay:
sh-4.4# ping 198.51.100.84
PING 198.51.100.84 (198.51.100.84) 56(84) bytes of data.
64 bytes from 198.51.100.84: icmp_seq=1 ttl=64 time=0.086 ms
64 bytes from 198.51.100.84: icmp_seq=2 ttl=64 time=0.059 ms
64 bytes from 198.51.100.84: icmp_seq=3 ttl=64 time=0.060 ms
^C
--- 198.51.100.84 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2053ms
rtt min/avg/max/mdev = 0.059/0.068/0.086/0.014 ms
Verifying latency via a testing script¶
To make sure that the latency configuration works as expected, both the
MachineConfig
and the Ansible Playbook deploys a simple testing script
/etc/network-pingtest.sh
on all machines where the latency scripts are installed.
See an example of the usage from a machine in zone b
:
# /etc/network-pingtest.sh
===============================================================================
ZONE_A
===============================================================================
PING 198.51.100.43 rtt min/avg/max/mdev = 10.300/10.377/10.510/0.125 ms
===============================================================================
ZONE_B
===============================================================================
PING 198.51.100.131 rtt min/avg/max/mdev = 0.202/0.223/0.243/0.016 ms
PING 198.51.100.159 rtt min/avg/max/mdev = 0.035/0.041/0.052/0.007 ms
PING 198.51.100.160 rtt min/avg/max/mdev = 0.172/0.200/0.218/0.026 ms
===============================================================================
ZONE_C
===============================================================================
PING 198.51.100.109 rtt min/avg/max/mdev = 10.213/10.242/10.296/0.122 ms
PING 198.51.100.140 rtt min/avg/max/mdev = 10.171/10.196/10.214/0.118 ms
PING 198.51.100.176 rtt min/avg/max/mdev = 10.223/10.254/10.286/0.086 ms
===============================================================================
ZONE_X
===============================================================================