COMPLEX KIE SERVER TESTS AUTOMATION PART 3: SMART ROUTER NETWORK ISSUES

Following Murphy’s Law, whatever you do not test against will happen (M. Nygard)

TL ; DR

  • KIE Smart Router resilience -against network issues- needs to be tested in an automated way
  • Testcontainers and Toxiproxy are perfect tools for easing these complex tests
  • Build KIE Smart Router temporary image from sources by means of Dockerfile
  • Parameterize tests and do not use sleep-before-check (polling is faster and more reliable)
  • Adopt this new motto: “whatever you do not test against, will happen”

MOTIVATION: NETWORK WILL FAIL SOONER THAN LATER

Modern systems are mainly distributed and connectivity is a key part of their design and operation. However, as quality-enthusiasts, we realize that it is fairly complex to test connection issues in a deterministic and automated manner, because:

  • involves multiple components with their interactions.
  • networking is built over abstractions. 
  • connection failures are unpredictable by nature. 

In the following article, the third of a series about complex automation testing with KIE Server (see I and II), we will focus on the KIE Smart Router to see how we can assure its reliability and robustness against one of the most important resilience killers: the network issues.

KIE Smart Router is a component that acts as a gateway:

  • hides the topology of the different KIE servers to their clients, forwarding requests, and aggregating responses into a single one before coming back.
  • is really useful in dynamic deployment environments where the client is agnostic about KIE Servers distribution.
  • handles multiple connections and has implemented stability patterns (like circuit-breaker) to cope with connection error scenarios. 

Let’s assume it: network issues are inevitable. Sooner or later will happen and waiting for a critical outage in production to find out how the system will perform is, without a shadow of a doubt, a recipe for pain. This is the main motivation of present work: anticipate the disaster by writing automated tests, simulating common kinds of network failures to prove the KIE Smart Router resilience.

CIRCUIT BREAKER PATTERN FOR NETWORK ISSUES

KIE Smart Router implements the well-known Circuit Breaker Pattern when there is a connection loss. The goal of this mechanism is to fail fast in order to prevent further consequences. Let’s see briefly how it works because later we will test it thoroughly. 

The routing table is the brain of the Smart Router. Consequently, it contains the relationships among

  • containers (identified by an alias as well as group-artifact-version) and
  • server-ids (logical names in the network) with
  • server locations needed for redirection.

There are two ways to populate this table:

  • By manual operation
  • Automatically, each KIE Server at startup will self-register into the Smart Router by passing its own id and location.  

When a connection issue happens, the system immediately updates this routing table by removing the location of the failing KIE server. Open circuit! Smart router won’t forward any request to that point of failure to guard the system against cascading errors and slow responses. Apart from this, new requests are going to be balanced to another working server for that container (if provisioned).

Next, the KIE Smart Router spawns a different thread of execution for periodically pinging the failing KIE Server until it reaches a maximum number of configured retries. When one of them succeeds (server connection is back again), the location is annotated again in the routing table, ready for more routine operation. Closed-circuit!

NETWORK ISSUES TESTING SETUP

Once we have introduced our testing scenario, let’s see the setup for putting in place the resilience test cases.

Similarly to other examples brought up in this series, we will take advantage of containerized applications and Testcontainers library and utilities.

The following figure depicts the initial configuration. Firstly, we create a network containing several KIE servers (one of them can act as a controller) connected to the KIE Smart Router with a secure/non-secure connection. Each box represents a Linux container that exposes a port to the network:

  • Secured KIE Servers: port 8443
  • Non-secured KIE Servers: port 8080
  • KIE Smart Router: port 9000

Moreover, each KIE Server deploys a different business application (kjar) in their respective KIE containers. Be aware of the different meaning of the word “container” here: the self-contained environments to run business applications. KIE containers are identified by a Group-Artifact-Version and/or alias.

All of these elements comprise the System-Under-Test (SUT). In front of it, there’s our test suite acting as a client application. 

DO TESTING BY PROXY

Now, we want to provoke network issues into this setup in a controlled and deterministic manner from the client. The way we’ve chosen to do it is by means of “Toxiproxy” containers. Toxiproxy is an open-source library for simulating abnormal network conditions (called toxics). 

These toxics cause connection failures emulating real network issues like connection loss, poor bandwidth, timeouts, connection reset by peer, high latency and jitter, sliced data into multiple smaller packets et others.

As you can see in the figure, the Toxiproxy container is a proxy that intercepts all the traffic between the KIE Smart Router and the KIE server (upstream and downstream). It exposes port 8666 to the network.

It can simulate java.net exceptions like these (which fail immediately, so great for not making the tests too long):

Java ExceptionToxic
java.net.SocketException: Unexpected end of file from servertimeout, limitdata
java.net.SocketException: Connection resetresetPeer
javax.net.ssl.SSLHandshakeException: Remote host terminated the handshaketimeout, limitdata
javax.net.ssl.SSLException: Connection resetresetPeer

TOXIPROXIES IN THE MIDDLE OF THE WIRE

We can initialize Toxiproxy in the code as shown below. Along with the out-of-the-box Shopify image, we will provide a shared network, network alias, and the log consumer to print out its logs:

@Container
public static ToxiproxyContainer toxiproxy = new     ToxiproxyContainer(DockerImageName.parse("ghcr.io/shopify/toxiproxy:2.4.0")
.asCompatibleSubstituteFor("shopify/toxiproxy"))
.withNetwork(network)
.withNetworkAliases(TOXIPROXY_NETWORK_ALIAS)
.withLogConsumer(new Slf4jLogConsumer(logger).withPrefix("TOXIPROXY-1"));

Toxyproxy will proxy the target container by invoking:

proxy1 = toxiproxy.getProxy(kieServer1, KIE_HTTPS_PORT);
proxy3 = toxiproxy3.getProxy(kieServer3, KIE_PORT);

At this point, you might be wondering "ok, that’s the way Toxiproxy reaches the KIE server, but how should I configure the Smart Router to get to the Toxiproxy?" 

Indeed, that’s a very good question. In this case, with self-registering, it’s the KIE server that sends its location (KIE_SERVER_LOCATION) to be populated into the routing table.  

When creating the KIE Server, we pass this Environment variable:

withEnv("KIE_SERVER_LOCATION",
args.get("KIE_SERVER_LOCATION_"+nodeName)+"/kie-server/services/rest/server");

Where KIE_SERVER_LOCATION_node1/2/3 are defined as system properties (in the pom.xml or they could be overridden at launch time)

org.kie.samples.server.location.node1 = https://toxiproxy:8666
org.kie.samples.server.location.node2 = https://kie-server-node2:8443
org.kie.samples.server.location.node3 = http://toxiproxy3:8666

To sum up, those KIE servers behind the Toxiproxy will pass their proxy network addresses.

CONTAINERS ALL AROUND

So, these are the containers in place and their origin:

“jbpm-server-full” image contains KIE Server, Controller, and Business Central, meanwhile “kie-server-showcase” is a lighter image with just the KIE Server. Both are available to download from Quay.

After that, we will create temporary images from them just for testing (a.k.a. images-on-the-fly) including business applications and the rest of the needed configuration.

Same for the KIE Smart Router image, but in this case, we do have to create it from scratch (no community binaries for it). Do not panic, KIE is an open-source initiative and we can generate all that we need by instructing a Dockerfile.

GENERATING THE KIE SMART ROUTER IMAGE

Dockerfile is like the instructions manual to build an image. It’s flexible enough to layer and skip repeated steps if nothing forces it to execute them again.

Starting from a JDK base (in this case, from JBoss which already contains jboss user), it will download git and maven tools. Next, it will proceed with the cloning of the repository (its branch and URL are configurable, by default they are main and droolsjbpm-integration repo) and the compilation of sources and their packaging.

Then, it will include some properties files (for configuring the KIE Smart Router, logging, and the certificate for TLS communication) and will execute this command to import the certificate into a trust Keystore (as it is a self-signed certificate, created ad-hoc for testing purposes):

keytool -importcert -noprompt -trustcacerts -alias toxiproxy-full-ks -file $ROUTER_HOME/kieks.crt -keystore /etc/pki/java/cacerts -storepass changeit

An aside about certificates and TLS communication:

Certificate generation

In order to generate, in your localhost, a self-signed certificate valid for multiple hostnames with keytool, you must include the DNS (network alias) as Subject Alternative Names (SAN) if you don’t want to get a “no name matching” exception:

keytool -genkeypair -alias toxiproxy-full-ks -keyalg RSA -keysize 2048 -validity 365 -keystore serverks.pkcs12 -storetype PKCS12 -dname "cn=Kie Server,o=jbpm,c=ES" -keypass secret -storepass secret -ext san=dns:full-node1,dns:toxiproxy,dns:localhost

Secondly, we will export it into a .crt file. For example, naming it as kie.crt:

keytool -export -alias toxiproxy-full-ks -file kie.crt -keystore serverks.pkcs12

Enter keystore password: 
Certificate stored in file <kie.crt>

KIE Smart Router image will use this one.

In the KIE Server, for enabling secure connections, we must execute this jboss-cli command as part of the initialization: 

security enable-ssl-http-server 
--key-store-path=$JBOSS_HOME/standalone/configuration/serverks.pkcs12
--key-store-password=secret

From Dockerfile to image-on-the-fly

Finally, the entrypoint of the container will be the standard “java -jar …” command including the $ROUTER_PROPS to enable the file configuration and its watcher.

withEnv("ROUTER_PROPS", 

"-Djava.util.logging.config.file=$ROUTER_HOME/logging.properties

 -Dorg.kie.server.router.config.watcher.enabled=true

 -Dorg.kie.server.router.config.file=$ROUTER_HOME/smart_router.properties");

From this Dockerfile, the Testcontainers utility “ImageFromDockerfile" will build the image of the KIE Smart Router containing also the network configuration, the LogConsumer (whose purpose is avoiding sleep calls) and will wait for the expected message to consider the component "up and running":

withNetwork(network);

withNetworkAliases(SMARTROUTER_ALIAS);

withExposedPorts(SMARTROUTER_PORT);

withLogConsumer(new Slf4jLogConsumer(logger).withPrefix("SMART-ROUTER"));
waitingFor(Wait.forLogMessage(".*KieServerRouter started on.*", 1).withStartupTimeout(Duration.ofMinutes(2L)));

NETWORK ISSUES TEST INSIGHT                                           

You can find the code and configuration for this example here. Let’s give some hints on how to parameterize and structure tests for easy scale.

Test cases are aimed to validate whether the component fulfills the circuit breaker pattern, making it stable and usable during hard network conditions. The routing table (kie-server-router.json file) has to be consistently updated to open and close the circuit, and a polling mechanism is launched to check when the connections are recovered.

COMPLEX AS SYSTEM TESTS, STRAIGHTFORWARD AS UNIT TESTS

JUnit5 Parameterized tests with @MethodSource will allow us to define the different toxics to apply in each proxy for exercising the same tests in different contexts.

This static method “provideToxics” returns a Stream of Arguments that will be passed to each test. We can combine the toxics as we want in our testing matrix without interfering with the implementation of the tests.

Notice that we can even set up the properties of these toxics based on random values between a range (as the waiting time before a timeout):

ToxicSupplier<Toxic, IOException> timeout3 = () -> 
proxy3.toxics().timeout("timeout", DOWNSTREAM, getRandomTimeout(2000,5000));

When a toxic is applied (ToxicSupplier is a functional interface that defines a get method like Suppliers) for a proxy, the abnormal behavior of the network begins over that path.

On the other hand, when invoking removeAllToxics method, toxics are completely wiped out. As a result, the network comes back to a healthy condition.

This control flow of the impediments on the arrange-act-assert steps leverages the power of the tests. The test suite is completely self-contained, managing the resources easily, in a predictable way, like in a unit test. Here, the unit is our SUT comprising several components and connections.

Finally, tests don’t rely on sleep functions to wait for the expected behavior of the SUT but actively poll over the routing table or the logs to check if the desired state is already reached before a timeout. This approach is not only faster, but furthermore, it’s also less error-prone in CI environments. 

CONCLUSION: AUTOMATE NETWORK ISSUES TESTING

Beyoncé Rule, followed by Googlers and other major players in the industry, states that “if you liked it, then you shoulda put a test on it”. A test here means an automated test. For some critical aspects, like how the system handles network failures, these tests may have some complexity, but with new containerized tools (like Testcontainers and Toxiproxy) the effort is really worth it.

Network issues won’t be completely prevented ever and modern architectures (KIE Smart Router is a good example) follow resilience and stability patterns to minimize their effects. But you will only be confident that the system exhibits the desired behavior when you write an automated test for it and this one becomes part of the CI to execute regressions. 

Investing in automated testing for a great variety of network issues is, without a shadow of a doubt, a recipe for success.

4.7 3 votes
Article Rating
Subscribe
Notify of
guest
1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments