In this blog post I’ll talk about my story about enabling the project to use the new JUnit 5 features, including parallel execution, parametrized tests and extensions, and how they’re going to help us improve our test codebase.
In Flink we’ve all kind of tests you can think of:
In most of the codebase, we refer to integration tests as tests that define and run a streaming job, but use a mocked sink and source. While end-to-end tests are like integration tests, but they use real external systems, such as Kafka, deployed with TestContainers.
In particular in Flink SQL, unsurprisingly, we have a lot of integration tests, because each single feature requires to be “understood” by all the different moving parts of our stack. For example, take the built-in function COALESCE
: it has a runtime implementation, a Table API expression [1], a custom logic for arguments type inference and return type inference, an optimizer rule that removes the call when possible. Each of these single pieces need to work in harmony, and integration tests usually gives us the guarantee that everything fits together.
Another aspect of JUnit 5 tests is that we have a lot of test bases and of parametrized test bases, à la Junit 4. This is due to the organic growth of the project and the effort to try to standardize certain test aspects, like starting and stopping a Flink MiniCluster
, the embedded Flink cluster to run test jobs.
In porting to JUnit 5, we want to:
In particular the last point is a hot topic, as today a CI run usually takes between 1 and a half and 2 hours, having a significant impact on the development loop of the project.
Parallel tests are, in my opinion, the killer feature of JUnit 5 and the real incentive to port JUnit 4 tests to JUnit 5.
With JUnit 4 you can parallelize test execution using the build tool, for example using maven-surefire-plugin
fork JVM feature. It runs tests in parallel by spawning several JVM processes, where each of them gets assigned a split of the overall list of tests to run. JUnit 5 on the other hand runs all the tests within the same JVM: the test runner manages a thread pool and takes care of assigning test cases to threads.
I think the JUnit 5 approach fits best in our use case, as spawning several JVMs is very resource intensive on constrained machines such as CI runners, so it’s a constant source of issues. This statement is also valid for contributor’s machines, as these days just running the browser with 20+ tabs open, Spotify, Slack and the IDE can easily eat up to 16Gb of RAM. Plus they work in any IDE without additional configuration, they can help you find out thread safety bugs and the granularity of the execution is easy and flexible to configure.
To start using JUnit 5 parallel tests, we just had to create a file called junit-platform.properties
in our test resources and add the following:
1 | junit.jupiter.execution.parallel.enabled = true |
This configuration enables to opt-in specific tests/classes to run in parallel. To flag a test class to run in parallel, we need to annotate it with @Execution(CONCURRENT)
. Thanks to this configuration we can gradually enable parallel execution only for tests we know are safe.
JUnit 5 offers the ability to configure the granularity of the parallel test execution, e.g. run all test cases from all classes in parallel, run all test cases from a class sequentially but run the test classes in parallel, etc. Check out all the available options in the JUnit 5 documentation.
So I’ve decided I wanted to try this new feature with some complex parametrized test base. I ended up choosing CastRulesTest
, a parametrized unit test base checking the runtime implementation of the CAST
logic. Because there is no shared state whatsoever, this test suite is embarrassingly parallelizable. Just adding the annotation @Execution(CONCURRENT)
gave me a 3x times faster execution time for the whole suite.
That 3x totally got my attention, so I wanted to try to apply the same annotation to integration tests as well. As my next target, I’ve chosen BuiltInFunctionTestBase
. As the name implies, this is a parametrized integration test base we use to test the correct behaviour of our built-in functions. We have around 10 classes where we use this test base, for a total of 1200+ integration test cases.
The first thing I had to do was to port to JUnit 5 the base class and its inheritors. My initial thought was to use @TestFactory
feature, a new JUnit 5 feature to spawn dynamic tests, allowing you to group test cases. Think to it as a more powerful @ParametrizedTest
.
This would have allowed me to have a nice nested view of the tests in the reports. For example, look at MathFunctionsITCase
: by mixing DynamicTest
and DynamicContainer
I could have achieved a report like:
MathFunctionsITCase
:PLUS
:f0 + 6 = 20
: Successf0 + 6 = 10
: FailureMINUS
: […]Because the test base itself already had some models to define test cases, including input/output data, query expressions and configuration (see TestSpec
), what I had to do was simply to convert these models to DynamicTest
/DynamicContainer
. A little refactor of the testFunction
method to wrap the logic into DynamicTest
did it. Now every concrete test class just had to implement the abstract method getTestSpecs
to return the test cases defined with my TestSpec
class, so the final implementation of the @TestFactory
just looked like:
1 |
|
Every implementation class provided a DynamicContainer
, containing a set of tests for a specific built-in function, like PLUS
, and each container had a set of DynamicTest
with the specific test cases for that built-in function, like f0 + 6 = 20
.
Last but not least, to let the query run, I needed the extension to set up MiniCluster
once per class. This is already available in our flink-test-utils
, so with some copy-paste I enabled it:
1 | public static final MiniClusterWithClientExtension MINI_CLUSTER_RESOURCE = |
Tried a first run without parallel tests, and everything ran fine. Tried to add @Execution(CONCURRENT)
, and Intellij IDEA welcomed my idea with a long report full of red crosses.
MiniCluster
extension firstLooking at the logs, it became evident how the problem was MiniClusterWithClientExtension
, given several tests were trying to push jobs to a MiniCluster
already shut down.
The MiniClusterWithClientExtension
was developed by wrapping the JUnit 4 MiniClusterWithClientResource
rule in a custom interface defining before
and after
. Then, to register it, you could either use AllCallbackWrapper
or EachCallbackWrapper
to define whether to have one MiniCluster
per test class, or per single test.
In JUnit 4 rules have a before
and after
extension point, and then when you register them, depending on whether you use @Rule
or @ClassRule
, the rule is executed for each test or once per class. In JUnit 5 the user cannot pick whether the extension is used globally or per method: it’s the extension itself that defines where it can hook in the lifecycle.
An example of this shift of concept between JUnit 4 and 5 is provided by the JUnit 5 TestContainers
integration, which depending on whether the container field is static or not, decides to share it between test methods or not[2].
Our AllCallbackWrapper
or EachCallbackWrapper
were circumventing the new JUnit 5 Extension paradigm, bringing back the same semantics of @Rule
or @ClassRule
. And this worked fine, until I tried to use parallel execution.
The MiniClusterWithClientExtension#before
method was starting MiniCluster
, creating some ClusterClient
and then setting up a thread local for the environment configuration lookup.
The interaction between our ThreadLocal
configuration when using AllCallbackWrapper
didn’t sound right, so I tried to do a little experiment. Take this simple extension:
1 | public class TestExtension |
This is simply going to print the thread where the various hooks are executed, and it also prints the name of single test in case in beforeEach
/afterEach
. Then I tried to run this test:
1 |
|
And here is the result:
1 | before all: ForkJoinPool-1-worker-1 |
As I was expecting, beforeEach
and afterEach
runs in the same thread of the test, as effectively they just “wrap” the test method, as described here.
In other words:
beforeAll
and afterAll
, butbeforeEach
and afterEach
are guaranteed to be executed within the same thread of the testSo here was the solution: we just had to set/unset that ThreadLocal
everytime in beforeEach
/afterEach
, no matter whether the MiniCluster
instance was meant to be per test class or per test method.
I was ready to declare victory, so I did some refactoring of MiniClusterWithClientExtension
to always create one MiniCluster
per test class, I removed the wrapping around AllCallbackWrapper
and then I tried to run again my tests: still everything was red.
@ParametrizedTest
After some investigation, I found out that DynamicTest
lifecycle doesn’t work per DynamicTest
instance, while parallelization does. For example, for this test class:
1 |
|
You get this output:
1 | before all: ForkJoinPool-1-worker-1 |
As you see, each DynamicTest
is executed in parallel, as you would expect, but the beforeEach
and afterEach
is executed only once per @TestFactory
. Because of this behaviour, the tests were not picking the correct ThreadLocal
, hence failing the test because they could not find the MiniCluster
.
It seems like there is no solution to this problem, and there is an issue open in the JUnit 5 issue tracker about whether this should be supported or not.
So I had to fall back to the good old @ParametrizedTest
: I just removed the nested DynamicContainer
, and I created my own DynamicTest
like data structure to wrap a test Executable
and its display name:
1 | /** Single test case. */ |
And then I kept the same code as before in the test base, but I had to add a method to flatten the previously used DynamicContainer
s:
1 | abstract Stream<TestSpec> getTestSpecs(); |
The report doesn’t look as nice as with @TestFactory
, but it finally works fine with parallel execution.
Before the parallelization, this test suite ran in around 50 seconds from Intellij IDEA on my i7 11th Gen packed with 16Gb of RAM. After the parallelization the suite runs in around 20 seconds[3]. Not bad.
I think JUnit 5 is a very powerful tool, but you need to be careful when developing extensions that are supposed to work with parallel tests. These are my gotchas from this experience:
beforeEach
, so they’re set in the right thread, and cleaned up in afterEach
, to avoid next tests reusing that thread to pick up the old ThreadLocal
valuesbeforeEach
could be invoked in parallel on the same instance of the extension, so you need to make sure the access to extension fields is thread safe, or…ExtensionContext.Store
, as it’s thread safe, it’s hierarchically organized per hook context, and it’s particularly useful to auto cleanup objects built in the before[..]
methodsParameterResolver
together with the ExtensionContext.Store
. This simplifies the implementation when storing your extension state in ExtensionContext.Store
.And last but not least, don’t use @TestFactory
in conjunction with parallel execution if your test requires ThreadLocal
or other thread dependant thing correctly configured.
We implemented the high-level ABI to start watchers on Kubernetes resources, as argued in Next steps paragraph. This allows us to identify watch requests sent by controllers and to deduplicate them.
Then we worked on modifying our ABI to make it asynchronous, in order to invoke the controllers only on-demand. Now, we don’t spin up a thread for each module, but we wake controllers up asynchronously when there is a new task to process.
Thanks to these changes, we’re now able to use Rust’s async
/await
inside the module. This allowed us to realign our fork of kube-rs
, bringing back all the original interfaces, and run kube-runtime
inside the modules.
kube-watch-abi
ABIThe kube-watch-abi
ABI was originally composed by an import and an export:
1 |
|
The watch_req
is a description of the watch to register:
1 | pub(crate) struct WatchRequest { |
When the module invokes the watch
import, it registers a new watch and returns a watch identifier. This identifier is stored together with a reference to the Stream<WatchEvent>
in a global map.
In similar fashion to the request
ABI discussed in the previous post, we serialize the WatchRequest
data structure in order to pass it to the host.
Then, every time the host has a new WatchEvent
that the controller needs to handle, it will invoke on_event
with the serialized WatchEvent
. Using the watch identifier, the controller will get the associated Stream
from the global map and append the deserialized WatchEvent
to it.
When the controller invokes watch
, the host checks if there is a registered watch for that resource. If there is, then it just registers the invoker as interested to that watch, otherwise it starts a new watch.
Every time a watch receives a new event from Kubernetes, the host resolves all the modules interested to that particular event. For each module, it allocates some module memory to pass the event and it finally wakes the controller up by invoking on_event
.
The kube-watch-abi
is the de-facto an asynchronous API: watch
starts the asynchronous operation, on_event
notifies on the completion of the operation.
In the initial implementation of the watch
, on module side, I just modified the kube client watch
to provide a callback:
1 | pub fn watch<F: 'static + Fn(WatchEvent<K>) + Send>(&self, lp: &ListParams, version: &str, callback: F) { |
Every time on_event
was invoked, the module resolved the callback from the watch identifier and invoked it.
This was fine as an initial approach, but we soon hit a problem: Porting all of the existing Kubernetes client and runtime code from async
/await
to the more primitive callbacks could have caused a lot of issues, making the fork diverge too much from the upstream code.
async
/await
worksHere’s a little refresh on async
/await
from Asynchronous Programming in Rust book:
async/.await is Rust’s built-in tool for writing asynchronous functions that look like synchronous code. async transforms a block of code into a state machine that implements a trait called Future. Whereas calling a blocking function in a synchronous method would block the whole thread, blocked Futures will yield control of the thread, allowing other Futures to run._
Although there are a lot of details about how Rust implements the async
/await
feature, these are the relevant concepts for this post:
Future
is the type that represents an asynchronous result, e.g. async fn
returns a Future
. You can wait for the result of a Future
using .await
.Stream
is the same as Future
, but it returns several elements before None
, which notifies the end of the stream..await
, you must be in an async
code block.async
code block, you need to use a task executor.For example:
1 | use futures::executor::LocalPool; |
In our async
/await
implementation we implemented the Future
and Stream
traits, we reused LocalPool from the futures
crate as a task executor, and we generalized on_event
to notify the asynchronous operation completion.
We first analyzed the controller lifecycle:
run
, the controller starts a bunch of watchers and then it waits for new eventson_event
This means that during the run
phase, the controller starts a bunch of asynchronous tasks, one or more of them waiting for asynchronous events. After the run
phase completes, there is no need to keep the module running. When we wake the controller up again, we need to check for any tasks that can continue and run them up to the point where we have all the tasks waiting for an external event.
To implement this lifecycle, we need to execute both on run
and on on_event
the executor method run_until_stalled()
, which will run all tasks in the executor pool and returns if no more progress can be made on any task. This allows us to implement run
as follows:
1 |
|
Future
/Stream
In order to encapsulate the pending asynchronous operation, we implemented our Future
. The implementations are straightforward and pretty much the same as explained in the Rust async book:
1 | /// Shared state between the future and the waiting thread |
A similar implementation exists for the Stream
trait. To create a Future
, we use this method:
1 | pub fn start_future(async_operation_id: u64) -> AbiFuture { |
on_event
to wakeup_future
/wakeup_stream
At this point, we took the concept of on_event
and generalized to async
/await
, introducing wakeup_future
/wakeup_stream
ABI exports.
Every time the controller invokes an asynchronous ABI import (like watch
), it gets the identifier that we use to instantiate our Future
/Stream
implementation.
When the host completed the asynchronous operation, it invokes wakeup_future
/wakeup_stream
. The controller marks the Future
/Stream
as completed, including the result value, and invokes LocalPool::run_until_stalled()
to wake up tasks waiting for that future/stream to complete.
1 |
|
This is a complete flow of an asynchronous ABI method:
sequenceDiagramparticipant C as Controller moduleparticipant H as Hostactivate CC ->> H: do_async()activate HH ->> C: Returns async operation identifierC ->> C: run_until_stalled()deactivate CNote over H: Waiting for the async resultH ->> C: wakeup_future()deactivate Hactivate CC ->> C: run_until_stalled()deactivate C
You can find the complete code regarding async
/await
support here: executor.rs
After we implemented async
/await
in our WASM modules, we refactored the request
ABI discussed in our previous post as an asynchronous ABI method:
1 |
|
Now the returned value is the asynchronous operation identifier and, to signal the completion of the request, the host invokes wakeup_future
.
We also included a new ABI method to sleep the execution of the module:
1 |
|
This is necessary to run the kube-runtime, which performs some sleeps before synchronizing the internal cache again.
The first implementation of the new kube-watch-abi
ABI was a little rough: a lot of blocking threads, shared memory across threads, some unsafe sprinkled here and there to make the code compiling.
Because of that, we redesigned the host to transform it in a full asynchronous application made of channels and message handlers. For every asynchronous ABI method there is a channel that delivers the request to a message handler, which processes the request, computes one or more responses and sends them back to another channel. This last channel delivers messages to the AsyncResultDispatcher
, owner of the module instances, that invokes the wakeup_future
/wakeup_stream
of the interested controller.
Today we have 3 different message handlers, one for each async ABI method:
kube_watch::Watchers
that controls the watch operation. This message handler is also able to deduplicate the watch operationshttp::start_request_executor
to execute HTTP requestsdelay::start_delay_executor
to execute delay requestsWhen the host loads all the modules, it executes the ABI method run
for each module, then it transfers the ownership of module instances to AsyncResultDispatcher
that will start listening for new AsyncResult
messages on its ingress channel.
Because all the message handlers and channels are async
/await
based, if all the handlers are in idle, virtually no resource is wasted with threads waiting.
Since AsyncResultDispatcher
controls all the different module instances, it avoids invoking the same controller in parallel: LocalLoop
is a single threaded async task executor, hence a module cannot process multiple async results in parallel.
Thanks to all the async changes, we managed to realign most of the APIs of kube-rs
to the original ones. This allowed us to port kube-runtime
to our WASM controllers.
The kube_runtime crate contains sets of higher level abstractions on top of the Api and Resource types so that you don’t have to do all the watch/resourceVersion/storage book-keeping yourself.
The problem we experienced with compiling kube-runtime
to WASM is that it depends on tokio::time::DelayQueue
, a queue that yields components up to a specified deadline. DelayQueue
uses the Future
type called Delay
to effectively implement delays. The problem with this Delay
is that it’s implemented using the internal ticker of the Tokio async task executor Runtime
, which we don’t use inside WASM modules.
In order to fix this issue, we had to fork the implementation of tokio::time::DelayQueue
and reimplement the Delay
type using the delay
ABI shown previously:
1 | pub struct Delay { |
With the custom implementation of DelayQueue
, the rust-runtime
compiled successfully to WASM, and we managed to port our controllers to use it!
1 | async fn main() { |
You can check out the complete code of controllers today here: simple-pod controller
If you want to look at all the different changes the project went through, look at these PRs:
watch
ABIasync
/await
+ watch
returns Stream
delay
ABINow the controller module looks like a real Kubernetes controller: the difference is minimal to a controller targeting the usual deployment style. We also opened up a door for important optimizations, thanks to the watch
ABI method. The host refactoring and the async ABI methods should also simplify the future interaction with Golang WASM controllers, because our ABI now resembles the asynchronous semantics of their WASM ABI.
Our next goals are:
ServiceAccount
s per controllerStay tuned!
]]>jdeps
and jlink
.The application is the data plane of Knative Eventing Kafka Broker, an implementation of Knative Broker
tailored on Kafka.
We didn’t handwritten nor generated a module-info.java
for our application, we just ship a fat-jar. To generate a fat-jar, configure the maven-shade-plugin
:
1 | <plugin> |
jdeps
is the tool you must use to figure which JDK modules you depend on. If you run jdeps
just with the fat-jar as argument, you’ll get a list of all the packages in your jar and the JDK modules it depends on:
1 | jdeps receiver/target/receiver-1.0-SNAPSHOT.jar |
Note that jdeps
analyzes only the imports so, if you perform some reflections at runtime of JDK classes, they won’t be found by the tool.
To get an output that you can directly pass to jlink
:
1 | jdeps -q --print-module-deps --ignore-missing-deps receiver/target/receiver-1.0-SNAPSHOT.jar |
This is the list of jdk modules we depend on. Doing a quick check, I’ve found that:
java.base
it’s the module that contains all the core features of the jdkjava.compiler
contains the compiler types. It’s brought in by Guava and Vert.x CLI feature.java.naming
contains some JNDI types, to perform names lookups. This is required to perform DNS queries by Vert.xjava.security.jgss
and java.security.sasl
contains some security protocols implementation. They’re used by the Java Kafka client.java.sql
contains JDBC. Jackson databind and Google Gson (that we import transitively via Protobuf json) depends on it because they provide marshallers/unmarshallers for JDBC types.jdk.management
contains some interfaces to manage the JDK. This is used by Micrometer to instrument the JVM and collect metrics.jdk.unsupported
contains sun.misc.Unsafe
. Netty and Protobuf use it to perform off-heap allocations.Because some reflection is happening behind the hood to choose the Vert.x DNS resolver, jdeps
doesn’t discover the module jdk.naming.dns
, that you need in your Vert.x application to enable the DNS.
To resolve the deps and add the dns module:
1 | MODS=$(jdeps -q --print-module-deps --ignore-missing-deps receiver/target/receiver-1.0-SNAPSHOT.jar) |
Now you just need to invoke jlink
to generate your custom JDK:
1 | jlink --verbose \ |
This command will grab the modules you provided from your local machine and will create a JDK without manual, header files, debug symbols:
1 | jdk |
This is the complete Dockerfile (simplified) that builds the project and generates the JDK:
1 | ARG JAVA_IMAGE=docker.io/adoptopenjdk:14-jdk-hotspot |
And the generate_jdk.sh
script:
1 | !/usr/bin/env sh |
You can run the built container with:
1 | docker run imagename java -jar /app/app.jar |
Using jdeps
and jlink
our container image size is 2.5x smaller, which is a great achievement for us. We also plan to reduce it even further, in fact we want to:
java.compiler
and java.sql
are two good candidates.alpine
, which uses musl
as libc. Look at Project Portola for more details.Check out the full PR: https://github.com/knative-sandbox/eventing-kafka-broker/pull/265
]]>Kubernetes is the standard de-facto container orchestration engine of these days. As every orchestrator of “something” (bare metal machines, VMs, containers, etc), one day you need to expand the available abstractions the system provides to you. For Kubernetes, that day came around 3 years ago where they introduced the concept of Custom Resource.
To put it simply, everything you create/read/update/delete in Kubernetes is an API Resource. A CRD (Custom Resource Definition) defines a new resource you can create/read/update/delete in your cluster, which is not included in the built-in ones, while a CR (Custom Resource) is an instance of the CRD. For example, a Pod
is a Kubernetes resource from the core
API group, a Broker
is a CRD introduced when you install Knative Eventing on your Kubernetes cluster.
With a definition of a new resource, you need to implement the business logic to orchestrate it, that is to handle create/read/update/delete operations on the resource. For example, you might want spawn a new etcd
pod every time somebody creates a new EtcdCluster
CR. In order to do that, you need to implement a Kubernetes controller: an application that listens for events on one or more API resources and reacts performing some operations on the cluster.
Since the introduction of CRDs, we’ve seen a huge expansion of extensions to Kubernetes APIs. Kubernetes vendors are using CRDs to provide distributions well integrated with their underlying systems, software vendors are developing controllers to simplify the operations on Kubernetes environment (for example Strimzi to deploy Kafka on Kubernetes), new projects are born to provide middleware and tooling to develop applications on top of Kubernetes (Knative to create serverless applications). Marketing around Kubernetes even forged a new name for the last use case: Kubernetes native applications, aka applications where the API-surface relies on the CRDs. In OpenShift there is even a system, based on a bunch of CRDs, to install other CRDs with their respective controllers.
Although there is a distinction between controllers and operators, for the sake of simplicity in this post I’m going to always refer to controller as the entity that implements the business logic of a CRD.
This concept works pretty well for Kubernetes, although there are still some open questions about the controller itself. In this post I’m going to introduce you a solution Markus Thömmes and I designed that may revolutionize how we package, deploy and operate Kubernetes controllers.
Today a Kubernetes controller consists of more or less:
After we implement the controller, we need to configure and apply the Custom Resource Definition and we need to setup a Service Account, that is an account to connect to our Kubernetes cluster in order to perform the operations the reconciler does.
During the years a lot of tooling was created, mostly in Golang, to implement these controllers, depending on the user needs. To mention some of them:
Other projects, like Knative, opt for creating and maintaining their own tooling to implement controllers.
These tools usually provide:
In essence, all of these tools work the same though in that ultimately they will generate a binary containing the controller. Each binary can in theory contain a number of controllers, for example Knative’s framework actively encourages that with a main interface to start multiple controllers at same time. Crucially though, collapsing more than one controller into one binary and thus one process in the Kubernetes cluster needs to happen at compile time.
With the systems commonly used, we’ll get at least one process (aka Pod) per project that extends Kubernetes (i.e. at least one Pod for Knative CRDs, at least one Pod for Strimzi CRDs etc.). Each of those processes will be mostly idle for most of their lifetime, but they will steadily consume resources nonetheless.
In short: We’re flooding our Kubernetes clusters with applications that suck resources and remain in idle for most of their time!
The idea behind our solution is to create a runtime plugin system where every plugin is a Kubernetes controller.
This plugin system should have certain traits:
This plugin system can be seen as a mega controller. It can run as its own process, or it could even be merged with the kube-controller-manager, which is the process with all the controllers that manage the Kubernetes built-in resources:
The requirements of the plugin system are not trivial, but lucky for us WebAssembly (Wasm) comes in rescue!
Wasm is an instruction set, targetable from every programming language, that runs in an isolated Virtual Machine.
From webassembly.org: “WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable compilation target for programming languages, enabling deployment on the web for client and server applications”.
That sounds like the perfect fit for our idea:
A list of supported languages is available here and here. Wasm is flexible enough that it can be executed with ahead of time (AOT) compilation or with a just in time (JIT) interpreter.
Although WebAssembly sounds like a web or browser technology, in the last years several people are experimenting with this technology outside the browser. Most notably, CloudFlare Function as a Service offering called Workers uses Wasm to run C/Rust code using Google’s V8 isolations.
With WebAssembly, our plugin system looks like:
When we develop a controller, the compiled artifact is a binary Wasm module. A Wasm engine loads this module, eventually compiles it in AOT cases, and executes it.
At this point, we need to define the interaction semantics between the host and the plugins. In order to do that, we define the ABI (Application Binary Interface) of our plugin system: the functions the host imports in our plugin and the functions the plugin exports in our host.
First of all, an example of a Wasm ABI. This is the Rust code for a Wasm module:
1 |
|
This Wasm module exports the function foo
, which means that, after the module has been loaded, the engine can invoke foo
to execute it. On the other side, when we load the module in the engine, we must link some logic to the function println
in order to run it, otherwise we’ll get a link error.
We can get all imports and exports of a module using a tool called wasm-nm
.
1 | wasm-nm module.wasm -i #Get the imports |
This is the most important design choice of the project: the ABI influences the capabilities the host needs to implement, the portability of existing controllers, the APIs on the plugin side and so on. When we design the ABI we also need to take the differences of the programming languages into account: as an example, some may require additional interfaces to the host to run asynchronous code. The async schedulers in Rust and the goroutine scheduler in Golang both require a syscall to sched_yield
to yield a thread execution in this example.
Lucky for us, part of the job that includes the low-level primitives is covered by a work in progress spec called WASI (WebAssembly System Interface). Their goal is to provide a reduced set of POSIX-like APIs: read/write files, get system clock, get environment variables and program arguments, read/write a socket and so on. Although, as they state in their rationale document, it’s not goal of WASI to include primitives to open sockets/files, this is left to the users. From their docs:
One of WebAssembly’s unique attributes is the ability to run sandboxed without relying on OS process boundaries. Requiring a 1-to-1 correspondence between wasm instances and heavyweight OS processes would take away this key advantage for many use cases. Fork/exec are the obvious example of an API that’s difficult to implement well if you don’t have POSIX-style processes, but a lot of other things in POSIX are tied to processes too. So it isn’t a simple matter to take POSIX, or even a simple subset of it, to WebAssembly.
A socket is a process wide resource. This means that if module A opens a TCP socket, module B might have access to it if we don’t implement proper security countermeasures.
Given our knowledge and studies of the current state of art for Wasm ABIs, we found out there are 3 possible non mutually exclusive approaches to the ABI design:
epoll
, read
, write
, bind
, …)We think that a good solution should mix all these 3 “levels”. Some low-level APIs are always necessary for basic things like logging, getting configuration from environment, setting timeouts and so on.
Proxying HTTP can be useful to invoke services outside Kubernetes (e.g. to trigger a cloud vendor API to enable a service).
Exposing the Kubernetes client APIs unlocks a great potential to optimize the controllers beyond the optimized resource usage indicated above. Most prominently: the so called informer infrastructure can be shared between all the controllers. An informer is the part of a controller that listens for events on given resources. It builds up local caches that effectively reflect the state of said resources in-memory.
In our case, the host would setup these informers and watches just once per resource. If two controllers want to watch ConfigMap
s for example, the host would only need to setup one watch and keep one cache, reducing the amount of network traffic due to event delivery and memory consumption due to caching drastically. That effect becomes more and more pronounced as more extensions are being added.
Following the above approach, the host has always the full control of process resources (open files, open sockets, etc), we allow the modules to perform only certain operations and, most important, we open up for huge optimizations.
Markus and I built a prototype of the host and of the 2 controllers, a simple pod spawner and the Memcached example from operator-sdk. You can find all the code here: https://github.com/slinkydeveloper/extending-kubernetes-api-in-process-poc
The host is implemented in Rust and Wasmer, a popular Wasm engine supporting different compilation backends and with several language bindings. We implemented the controllers with Rust using a hacked version of kube-rs
client.
The host logic is pretty simple: when it starts, it reads the contents of the specified dir, looking for .yaml
containing the module manifests. An example manifest:
1 | name: memcached |
name
is the operator name and abi
is the abi the host should use to interact with the module. This allows to support different ABIs at the same time, mainly to overcome the differences between the programming languages and to support old modules running on new host versions.
Then, for each module, it compiles and runs it in a separate thread invoking the exported run
function. The module, through the custom ABI we designed, can interact with Kubernetes to start watching resources and react to the events.
The prototype contains several simplifications that we can overcome easily (like watching the modules directory more than loading all the modules once), while others require more engineering as discussed later in this post.
In order to create a running prototype, we ended up with a pretty simple ABI that mixes the low level WASI syscalls with a medium level HTTP client proxy functionality:
1 | wasm-nm memcached.wasm -i |
The exports are run
to start the controller and allocate
to allocate memory on the module. The implementer of the controller just implements the resources watch loop inside the run
function.
In Wasm the module cannot access to the host memory, but the host can access to the module memory and copy bytes inside it. Because the Wasm module might have a memory allocator, a GC or any other mechanism to manage memory, the module should export a function that allocates memory to let the host copy bytes back to the module.
request
is the function to perform a blocking and buffered HTTP request, while all the other imports come from WASI. The user interacts with our hacked version of the Kubernetes client, which under the hood invokes request
to perform HTTP requests using reqwest
crate. All our APIs are blocking because currently there is no out of the box async/await
support in Wasm modules, I’ll cover this limitation later.
The request
flow should give you an idea of what it takes to implement an ABI:
The host takes care of handling the authentication, in order to avoid the controller having to have access to files at all.
An example code for the controller looks like:
1 | let client = Client::default(); |
You can look at the full controller code for the above sample here. The user potentially doesn’t know that their code is run inside an isolation. Everything is hidden behind the usual Kubernetes client APIs. In other words: The programming model does not change.
There are some things we still haven’t tested, but that we’re willing to do in order to give us a clear idea of how to grow this idea:
Result
.Our dream is to port existing controllers to this approach. Most of Kubernetes world today lives in Golang, but up until this point I didn’t talk at all about Golang. Why?
The reason is that we found some critical issues within the Wasm/Golang support, among the others Golang assumes that the Wasm engine runs inside a JS VM using wasm_exec.js. In order to understand the impact of this choice, let me go through the Golang standard library implementation in Wasm.
Today in Golang, when you compile to Wasm, all the standard library I/O operations go through the syscall/js
module, which invokes a predefined set of imports in your module. From the doc: Package js gives access to the WebAssembly host environment when using the js/wasm architecture. Its API is based on JavaScript semantics.:
Global()
allows you to get the global
objectInvoke()
allows you to use a js.Value
as a function and invoke itNow let’s analyze how the net/http
client implementation on Wasm: When the user invokes RoundTrip
, the request is transformed in a Javascript object, then using the syscall Global
and Call
the javascript global function fetch
is invoked and finally, on the returned promise, the two callbacks are set to handle response and error.
The problem with this approach, although it works well in the browsers, is that the engine needs to implement all these JS functions used by Golang standard library. If we’re not using a Javascript based runtime, we need to reimplement this. Further, it requires us to secure access to the global object fields, in order to retain the isolation properties.
To make things even worse, at the moment there is no way to define imports and exports within the module, so we cannot define and evolve an ABI that fits our needs. The only workable solution with the existing tools consists in define “fake” Javascript functions on the host side for the imports and setting “fake” fields on the “fake” global object for the exports, but this approach makes the code far more complex and convoluted, both on host and Wasm module side.
While the official Golang distribution doesn’t support definining imports and exports, TinyGo supports it
Markus managed to run the Memcached operator example of operator-sdk, using NodeJS as a host, hacking some polyfills to match the expectations of wasm_exec.js. The final module ABI is:
1 | wasm-nm test.wasm -i |
I was initially puzzled by the resume
and getsp
exports of the Wasm module, but then digging in the wasm_exec.js codebase their existence was justified: using resume
and getsp
the host can yield the execution of the module, perform some async operations, and then resume the execution of the code. That’s the very same solution I would love to implement in our Rust ABI prototype and, as far as I know, WASI is looking for a generalized solution to support async runtimes inside the modules.
This experiment proves that Golang/Wasm support can definitely work and we think that, without the constraints of syscall/js
and with the ability to define our ABI, we could port all the existing controllers to compile in Wasm isolated modules.
Some interesting issues to follow about Golang and Wasm related to this discussion:
Our long-term goal is to improve the Kubernetes ecosystem, creating a production ready plugin system with low footprint. We encourage the Kubernetes community to give us feedback on our findings and our prototype, so we can work together on shaping the future of Kubernetes controllers!
Check out the second part of this blog post series: Kubernetes controllers: The Empire Strikes Back
]]>Reproducibility means that different runs of the same benchmark, testing the same system, running in the same environment, should lead to similar results.
This is one of the most important traits that every benchmark should respect, because without it, the test can’t be trusted.
For example, let’s assume your boss gave you to optimize the most important system in production. You begin writing a benchmark to understand how it performs and, without caring about reproducibility, you jump to start searching where the performance hotspots are and operate to solve them. Now you run again the tests and results are better than the beginning but, while you’re already feeling the bonus on the next paycheck, a colleague comments your “40% performance boost” PR saying “I tried to run the benchmark and results look worse than the beginning!”. What the heck? Did your PR improves the performance or not?
You can’t really answer that question, because your benchmark is not reproducible! You can try to run it several times but you’ll continue to get deniable and not correlated results, that can answer positively or negatively to your question.
Making the test reproducible, for a good part, depends on the environment where you run the test.
Kubernetes is a virtualized environment designed to scale up & down workloads, depending on resource demands. So it can arbitrarily schedule your application to run where it wants, imposing precise cpu/memory resources. I’ll show you what countermeasures I’ve took in my methodology to run benchmarks inside K8s to prevent such problems.
A couple of months I’ve started working on a project called Knative Eventing, an event mesh for Kubernetes. One of the goals of Knative Eventing is to enable message consuming & producing through HTTP, acting as a bridge between a “traditional” messaging system (such as Kafka) and an HTTP application.
I won’t cover all aspects of Knative Eventing, If you want to learn more about it check out the Knative Eventing documentation
Knative, among the others, provides the concept of Channel
, a flow of events from one or more producers to one or more subscribed consumers:
To push events into the channel you interact with its HTTP interface, while to receive events from the channel you subscribe to it, specifying at what HTTP endpoint the channel should send the events. Behind the hood, a pod called dispatcher is actually serving the HTTP interface for inbound events, managing the interaction with the messaging system and dispatching the events to the subscribers.
In this post I will use the test that calculates KafkaChannel
’s throughput and latency.
The test components are:
All these three components runs inside the cluster.
I won’t spend words in this post to explain how such components, designed together with Knative community, work in deep. If you want to know more about it, look at the README
.
The cluster where I’m running the tests is composed by three bare metal machines in the same rack and they’re running only these tests. It’s quite important to run on bare metal, otherwise you will need to make further steps to make your virtualization environment reproducible, depending on the VM system you use.
The question that arises is: what metric should be used to determine reproducibility? A wise answer could be that the standard deviation of the metric used to determine a performance improvement should be used to determine reproducibility.
In my case I’m going to use standard deviation of the percentiles of E2E latency (from sender to receiver) across several runs. The lower is the standard deviation, more reproducible is the test.
To improve reproducibility, I’ll start by configuring and running the test 5 times, to calculate a baseline standard deviation. Then I’ll show you the tweaks I’ve made to reduce the standard deviation to an acceptable value:
The first step is to configure the test to correctly generate a load that doesn’t blow up the system. System must be stressed, but in such a way that doesn’t lead to a complete degradation, or even a crash.
I’ve configured my test to force 500 requests per second for 30 seconds, which I’ve found, experimentally, that is a good configuration the system can hold. Bear in mind that different “requests per second” configurations leads to different latencies!
I’ve collected the 99%, 99.9% and 99.99% percentiles but I’ll focus on 99% percentile because I’ve managed to do only few and short test runs, and in such situations outliers are more visible and not filtered out in higher percentiles. In a “production run” of the test, you should run it for more than 30 seconds, to understand if higher latencies happens frequently.
After a first run, just configuring the test and running it for 5 times, I’ve these results:
P99 | P99.9 | P99.99 | |
---|---|---|---|
Run 1 | 266.266179 | 276.945500 | 284.709000 |
Run 2 | 264.750750 | 278.127000 | 283.149000 |
Run 3 | 250.629000 | 263.994500 | 271.937000 |
Run 4 | 250.594875 | 261.605000 | 272.635000 |
Run 5 | 266.224393 | 282.690500 | 290.529000 |
The SD of P99 is 8.312 and, in particular, the relative standard deviation is 3.2%.
From experimental evidence I’ve found that the relative standard deviation is not linearly related with the test configuration, which means that the more stress is applied by the load generator, the more could be the relative standard deviation.
Let’s try to dig into why these numbers are so different and how I’ve lowered them.
The first thing you can notice is that the third an fourth run performed with generally lower numbers than the others. Digging a bit with kubectl describe nodes
I’ve found that Kubernetes was scheduling on each run pods in different nodes. Sometimes it scheduled the sender and receiver in the same node of Kafka Channel dispatcher, letting them communicate with lower latencies!
To let Kubernetes deploy the pods always in the same nodes, I’ve configured the affinity of sender, receiver and all SUTs (system under test, which in my case means the Kafka Channel dispatcher and the Kafka cluster).
To do it, I’ve defined three labels:
bench-role: kafka
: Where Kafka cluster and Zookeeper are deployedbench-role: eventing
: Where the kafka dispatcher is deployedbench-role: sender
: Where both sender and receiver are deployedAnd then, I set these labels in my cluster using:
1 | kubectl label nodes node_name bench-role=eventing |
On every deployment/pod descriptor, I’ve configured affinity in my various deployment descriptors.
I deployed Kafka using Strimzi and, thanks to its Kafka
CRD, I can easily configure the affinity too (I’ve omitted irrelevant parts of this config):
1 | kafka: |
For kafka-ch-dispatcher
, I just modified the original dispatcher yaml adding the nodeSelector
(which is in fact a short version of the nodeAffinity
) and I redeployed from source using ko:
1 | apiVersion: apps/v1 |
Now the sender
and receiver
:
1 |
|
After pinning the workload to different nodes, I’ve ran again the tests:
P99 | P99.9 | P99.99 | |
---|---|---|---|
Run 1 | 263.552250 | 268.646500 | 272.223000 |
Run 2 | 266.060133 | 280.979000 | 285.983000 |
Run 3 | 266.994500 | 282.858000 | 292.864000 |
Run 4 | 268.234000 | 297.516000 | 326.862000 |
Run 5 | 265.809929 | 281.717000 | 288.665000 |
As you may notice, the first four runs looks incrementally worse. This happens because every run depends on the SUTs states caused by the previous run. The Kafka cluster and/or the Kafka channel dispatcher could be in a degradated state before a new run begins and this obviously reduces the chances to have same results over multiple runs. All systems involved in the road from sender to receiver must be reset, so every run starts stressing the system under the same conditions, ensuring that the latency of a run doesn’t depend on previous runs.
In my case just deleting all pods does the trick, since the Deployment
s spin up a new ones:
1 | kubectl delete pods -n knative-eventing --all |
As explained at beginning of this post, Kubernetes is designed to scale up & down workloads. What if the scheduler decides to schedule up and down our benchmark resources while the test is running? The benchmark needs to have granted the resources it needs & these should not change while is running. To do so, resource request
& limits
must be configured the same for every test and SUT, like:
1 | resources: |
This leads Kubernetes to schedule pods with QoS class Guaranteed
, so it can’t scale up & down resources.
The nodes where I’m running the benchmarks are configured with AMD EPYC 7401P 24 cores CPUs (so 48 logical cores) and 24Gb of RAM.
I’ve tried to match these limits as following:
The problem is, even if containers are configured with Guaranteed
QoS, there are no guarantees that the workload is pinned and it has exclusive access to the cores. By default, even in Guaranteed
QoS, Kubernetes can move the workload to different cores depending on whether the pod is throttled and which CPU cores are available at scheduling time. The Kube scheduler does it defining the CFS Quota for the running container, so it asks to the kernel scheduler to allocate a fixed time to such containers.
Luckily there is a way to force the CPU pinning, enabling the static CPU management. This can be done only configuring the Kubelet config file for each node. To do so:
kubectl drain <node name> --delete-local-data --force --ignore-daemonsets
cpuManagerPolicy: static
1 | systemReserved: |
I’ve tried to ran the tests after all these tweaks:
P99 | P99.9 | P99.99 | |
---|---|---|---|
Run 1 | 265.955238 | 271.344500 | 276.415000 |
Run 2 | 264.850000 | 271.462000 | 279.283000 |
Run 3 | 266.283643 | 291.772500 | 335.116000 |
Run 4 | 266.065179 | 272.497000 | 279.553000 |
Run 5 | 264.828300 | 271.254500 | 278.362000 |
This results looks far better! Now relative SD of P99 is down to 0.26% (0.7014114246) vs the initial 3.2%!
I still have some outliers at higher percentiles, but now the results looks more trusty than the previous 3.2% of relative SD.
To wrap up, I want to underline that these tweaks worked for me but they could not be enough for all benchmark configurations.
Get in touch with me if you have more tweaks to show, and stay tuned for more updates!
]]>If you are new to Vert.x, before going further I strongly suggest you to read vertx-core documentation and vertx-web documentation
Vert.x Web is a library built on top of Vert.x to create web applications. It provides high level features like routing, request body handling, authorization, etc. The core concept of Vert.x Web is the Router
which an object that can route requests to one or more Route
s based on a set of rules like HTTP method, HTTP path, accepted content types, etc. On each Route
you can define one or more Handler
s that contain the logic to process the request. When a Router
receives a request it creates a RoutingContext
object which has all methods to read the request, write the response, call the next Handler
and fail the context. Each Handler
that you register consumes the request’s RoutingContext
. Vert.x Web also provides some common handlers like the BodyHandler
that parses the request body and the AuthHandler
that manages authN/Z of the request
Vert.x Web API Contract generates a Router
starting from an OpenAPI definition. Everything revolves around an object called RouterFactory
: you create the contract, you specify to the RouterFactory
what are the handlers for the defined operations and it generates the Vert.x Web Router
for you. The RouterFactory
does some magics behind the hood to provide you a Router
that:
Route
instancesRoute
sVert.x Web API Service is a code generator based on concept of Vert.x Service Proxy. An event bus service is a Java interface that helps you to define in a more natural way an event bus message consumer. This approach leads to different benefits like the straightforward process to test a message consumer like any other Java class. As every EB event consumer, a service can inhabit inside the same verticle or it can be deployed in another application somewhere else in your microservices network. You can use the Vert.x Web API Service in order to mix Contract Driven capabilites provided by Vert.x WAC and these event bus service features.
When you use EB API Services, you don’t put the business logic inside the Route
handlers, but you can distribute it inside different services. Then by using Vert.x WAC, you can manage the linking between these services and the Router
instance
Now I need to define how to group the API Operations defined inside the contract into different API Services. An important thing to keep in mind about API Services is that, with a good design, you can turn each API Service into a microservice in a few minutes. Starting from this assumption, I want to group operations by different subdomains of my API. Debts Manager API handles users, transactions and status, so I’m going to organize operations to create a Users Service, a Transactions Service and a Status Service. Here is the mapping between services and operations:
Transactions Service | Users Service | Status Service |
---|---|---|
getTransactions getTransaction createTransaction updateTransaction deleteTransaction | login register getConnectedUsers connectUser getUsers | getUserStatus |
I need to assign an event bus address to each service. The interesting fact is that you can deploy more than one service instance on an address and the event bus will manage the load balancing between these. The event bus address could be any string in any format, although this it makes sense to use a domain like format to identify it. My choice is:
TransactionService
is available at transactions.debts_manager
UsersService
is available at users.debts_manager
StatusService
is available at status.debts_manager
In order to make the RouterFactory
been able to correcly link the Router
to the services, it must know what are the services’ addresses. To define these associations, you have a couple of different methods. I’m going to focus on the configuration based method: inside the OpenAPI document, for each operation, I define the related service event bus address; then just calling mountServicesFromExtensions()
, the RouterFactory
will inspect the OpenAPI document to find all those associations. E.G. the getTransaction
definition now looks like this:
1 | summary: 'Get a Transaction' |
Look at x-vertx-event-bus
documentation for more details.
vertx-starter
contains a collection of different project templates called presets. I’m going to focus on “OpenAPI Server with Event Bus” preset to scaffold debts manager.
To help the scaffolder generates the right service interfaces, you must define the service address - service name mapping. Just a couple of entries inside Debts manager API spec and you are ready to scaffold the project:
1 | components: |
Now open the vertx-starter page and with just a couple of clicks you have a zip with the project scaffolded!
Let’s dig into the generated code:
The scaffolder created for you:
package-info.java
files required to trigger Vert.x annotation processingMainVerticle
ApiClient
that can be used for testing purposespom.xml
in my case)As every scaffolder, it is necessary to make some adjustments for the project needs.
I made a couple of changes to adapt the project skeleton to my needs. In particular I configured in my pom:
maven-compiler-plugin
maven-surefire-plugin
vertx-maven-plugin
to execute and package my Vert.x applicationI also substituted the generated openapi.json
with my original debts_manager_api.yaml
to keep readibility of the spec document. The generated one is a bundled version with all $ref
s solved. If you need to bundle again your spec in future, I suggest you to use Speccy which is a very good tool that can also convert Json Schema to OpenAPI Schema while bundling the spec.
vertx-maven-plugin
and application propertiesVert.x Maven plugin provides a couple of facilities to help you execute and package the Vert.x application. The usage is very simple: you must add it to your build plugins and then you must define the FQCN of the Verticle to run:
1 | <properties> |
1 | <plugin> |
test_config.json
is a JSON containing application properties like PostgreSQL and Redis connection parameters. Vert.x Core includes this basic support to json configuration files: you can load it into your verticle with Vert.x command line, DeploymentConfig
or directly with vertx-maven-plugin
. If you want to configure a more complex properties management, there is a package called vertx-config that enables you to load HOCON configuration files, load configuration from a remote server, etc.
MainVerticle
The generated MainVerticle
contains two methods:
startHttpServer()
to create the RouterFactory, define the various handlers, generate the Router
instance and start the HTTP serverstartServices()
to instantiate and mount the event bus servicesAs I said before, the HTTP Server and the corresponding Router
don’t depend on the event bus services, so you can move these two methods into two separate verticles. For simplicity, I’m going to mantain everything inside one verticle. In next chapters, we will spend time into splitting the verticle.
To mount a service to the event bus I use an helper object called ServiceBinder
that lookups for the generated message handler and binds the service instance to the event bus:
1 | TransactionsService transactionsService = TransactionsService.create(vertx); |
As I previously said, the RouterFactory
can lookup into the OpenAPI document for the associations between services and operations with mountServicesFromExtensions()
. This makes the code of startHttpServer()
quite simple for the moment:
1 | private Future<Void> startHttpServer() { |
Vert.x Web already provides a good support for JWT thanks to Vert.x Auth JWT, so I don’t need to write an handler that manages the AuthN/Z.
To get JWT running you need an RSA key pair to sign your tokens. I opted for the JWK standard to store it and I generated the key pair and the key store using mkjwk.org.
Vert.x Auth JWT provides JWTAuth
auth provider, which is the object that can authenticate, authorize and generate tokens. Vert.x Web has an handler called JWTAuthHandler
that uses this auth provider to validate and extract the payload of the token of incoming requests. I modify the start()
method of MainVerticle
to load the jwk from filesystem and create the JWTAuth
:
1 | loadResource(jwkPath).setHandler(ar -> { |
And of course I modify the startHttpServer()
to use JWTAuthHandler
:
1 | routerFactory.addSecurityHandler("loggedUserToken", JWTAuthHandler.create(auth)); |
In next chapters you will see how to create a token during the login process.
The application is bootstrapped, now we are ready to deep down into application logic! In next two chapters I’m going to show you how I have implemented the persistence layer and the event sourcing layer.
Stay tuned for more updates!
]]>In this second chapter of Debts Manager Tutorial I would like to show you how I have designed the REST API of Debts Manager. I’m going to follow the API First approach, documenting all aspects of the API Design with OpenAPI 3.
This post doesn’t aim to provide you a full guide of how to design REST APIs: if you want more resources to learn it, look at the end of this post
The REST APIs, in contrast with RPC, are driven by the data the services wants to expose. In the previous chapter I gave you an idea of the entities we must expose. Now I tabulate these and the relative operations on it.
Entity | Create | Retrieve | Update | Delete |
---|---|---|---|---|
User | ✔ | ✔ | ❌ | ❌ |
User relationship | ✔ | ✔ | ❌ | ❌ |
Transaction | ✔ | ✔ | ✔ | ✔ |
Status | ❌ | ✔ | ❌ | ❌ |
This table is a pretty good starting point, but I must refine the analysis enforcing our methods with policies and logics.
These policies are primarly based on who is making the request. I’m going to define a login phase together with JWT to provide authorization and authentication. Each endpoint, except login
and register
, is secured with a JWT auth. My objective is expose, for each user, only a subset of data relative to the user itself.
Before defining the endpoints I must formally describe the data models representing the service entities. OpenAPI has its own Json Schema dialect to define models: OpenAPI Schema. This is an extended subset of Json Schema Draft 5. Meanwhile I’m writing, there is a proposal to allow usage of every version of Json Schema, including the newer versions, with an extension https://github.com/OAI/OpenAPI-Specification/issues/1532.
I place these schemas in main OpenAPI file under components
and schemas
keywords. I can refeer to it using Json schema references ($ref
keyword).
The simplest model here is the user. I want to expose only the username, so I represent it with a simple string
. This is the definition using OpenAPI Schema:
1 | Username: |
The status is represented by a map with users as keys and total debts\credits as values. In OpenAPI Schema:
1 | Status: |
In JSON maps are usually represented or as json array of tuples key-value or as a json object. The json object is the natural way to represent it, but it has an important restriction: keys are strings. In my case I need to represent a map string → number, so json object representation fits good. The map values schema are defined using additionalProperties
and, only with Json Schema Draft 7 or newer, keys schema are defined using propertyNames
.
The main transaction model is described below:
1 | Transaction: |
$ref
keyword points to the Username
schema I defined before.
This model doesn’t fit good for my usage, because for each Transaction
endpoint I want to apply some policies. A very common example is the id
field: when user inserts a new transaction I want to designate the database to fill the id
value. When the user creates a new transaction it shouldn’t add the id
field: that means that I can’t use the Transaction
model to describe the “create transaction” request body. Let’s look at all restrictions I want to apply on various transaction endpoints:
id
and at
are filled by the backend when user adds a new transaction and they are immutable from the API perspectivefrom
(sender) and to
(receiver) fieldsfrom
field because the backend fills it with the logged userTo apply these restrictions I create a new model for each endpoint. I’m going to refactor Transaction
into 3 different models: UpdateTransaction
, NewTransaction
and Transaction
.
These new models lead to a new problem: duplication of model fields definitions. Json schema solves the duplication with schema composition keywords: allOf
, anyOf
and oneOf
. In particular I will use allOf
to achieve inheritance of schemas.
This is the final result:
1 | UpdateTransaction: |
The schemas inheritance tree is UpdateTransaction
←NewTransaction
←Transaction
OpenAPI document structures the endpoint definitions as follow:
1 | paths: |
OpenAPI path strings allow path parameters using {paramName}
and doesn’t require an explicit definition of query parameters.
In OpenAPI terminology an operation is an API endpoint identified by a path and an HTTP method. Every operation could be uniquely identified with an operationId
. The OpenAPI Specification (OAS) documents this field as optional, but I strongly suggest to specify it if you don’t want to see your tooling explode. Most code generation tooling asserts that operationId
is present. If it’s not present they try to infeer it from path and http method producing unexpected results.
For each operation we are going to define:
operationId
parameters
(if any): List of header
, path
, query
and cookie
parametersrequestBody
(if any): Content type and content schema of request bodiesresponses
: Status code with response content type and schemasI also fill the security
field for each operation to require a JWT token to execute it.
Let’s start with transaction CRUDs:
Operation | operationId | CRUD | Path | HTTP Method |
---|---|---|---|---|
Create a new transaction | createTransaction | Create | /transactions | POST |
Get a single transaction | getTransaction | Retrieve | /transactions/{transactionId} | GET |
Get user related transactions | getTransactions | Retrieve multiple | /transactions | GET |
Update a transaction | updateTransaction | Update | /transactions/{transactionId} | PUT |
Delete a transaction | deleteTransaction | Delete | /transactions/{transactionId} | DELETE |
In OpenAPI:
1 | /transactions: |
Note that for all operations under /transactions/{transactionId}
path I haven’t redefined every time the parameter transactionId
: I have defined once at path level.
Status has only the retrieve operation, but I want to let user customize the output based on transactions insertion datetime: clients can use query parameter till
to ask the status till the date time provided, excluding newer transactions. You can use it to throw back in your house mate face that he didn’t pay the bills for a quite long time.
1 | /status: |
The service supports creation and retrieval of users and user relationships. For simplicity I avoided to include U and D operations for user and user relationships.
I want to expose an endpoint to retrieve all registered users and an endpoint to retrieve only users that have a relationship with logged user:
1 | /users: |
In getUsers
we defined a very basic search functionality with the optional query parameter filter
In getConnectedUser
I prefeered to define the request schema directly inside the request body definition because It’s a schema strictly related to this operation and It isn’t parent of any other schema.
This is the endpoint to create a user connection (user relationship):
1 | /users/connected/{userToConnect}: |
When an user wants to start using this API he must authenticate with his credentials following this process:
/login
endpoint passing his credentials in the request bodyFor each request the server must authorize the user. The user must include inside each request the header Authorization: Bearer <jwt token>
. When the backend receives the request it checks the signature validity and the token expiration time. If the token is valid It parses the payload, where It can read the username of the logged user.
This is the login
operation definition:
1 | /login: |
The register
operation creates a new user and logins it:
1 | /register: |
I don’t cover in this tutorial the logout process, but I want to give you a tip: create a whitelist or blacklist of tokens.
As you already saw, each secured operation has the security
field:
1 | security: |
The security
field is called security requirement and it tells the user that he needs loggedUserToken
security schema to access to this endpoint. Security schemas must be defined under #/components/securitySchemes
:
1 | securitySchemes: |
I give you a couple of useful links:
You can find the complete OpenAPI definition here: /src/main/resources/debts_manager_api.yaml
After you learnt how to design a REST API, approacching to OpenAPI is very simple. The operation definition is very intuitive because of 1:1 mapping with HTTP (methods, parameters, status codes, content types and so on). The tricky and magic part, for me, is definining and organizing the JSON Schemas. When you define simple models, you tend to put everything inside the same file. But when you raise the complexity using composed schemas, you get flooded by smaller and unclear schemas. My suggestion for you is to document the schemas with title
and description
keywords and organize these in multiple files.
In next chapter I’m going to bootstrap the project and start writing first Vert.x code, stay tuned!
]]>Some notes before starting: I’m going to make this guide as complete as possible, but keep in mind that this is a side project and It could contain bugs and It could be incomplete. I will try to cover all interesting aspects about API design, implementation, testing and I will show you how I implemented Event Sourcing and CQRS. I don’t plan to write a frontend for it (I don’t want to hurt your eyes), but if you want to help me I’m glad to accept it!
The code is already available on GitHub but It could change while I’m writing the guide.
The purpose of Debts Manager is to manage the debts between two users of the service. The idea is similar to Splitwise, but it will support only bills between two users. Every user should be registered to use the application. Then, if you want to receive bills from another user, you must connect to that user. When you are connected, you can bill him creating a transaction. For example:
The final result is: user B now has a debt of 5 Euros with user A. Debts Manager will show to both users their status with various debts/credits
The connection between users are unidirectional, which means that if users want to bill each other they must create two diffent connections. There is no group concept, I wanted to keep things as simple as possible.
Before going further I want to show you a couple of things of the overall design of the application. These are required to undestand various aspects of the tutorial.
For persistence I choose PostgreSQL to store my data. The application stores into the database:
The DB access is provided by the blazing fast reactive-pg-client library
The application stores the transactions between users (events). You can use it as a log of various bills, but you also want to look at a summary of various credits/debits between connected users. To build it, I aggregate the various transactions into one single structure that I call status. Every user has a status and is represented as a map with users as keys and total credits/debits as values. This map is built incrementally every time a user adds/modifies/removes a transaction and is stored in a Redis cache.
The application exposes a Web REST API that you can interact with. It is documented with an OpenAPI 3 file and exposes most of CRUD endpoints for users, user connections and transactions (some are missing to keep things simple). It also has an endpoint to access status of users. The endpoints are protected with JWT tokens, so to use the application you must complete a login request and you get a token to use for the following requests. The Web API is implemented using vertx-web, vertx-web-api-contract and vertx-web-api-service.
Okay I admit it, I’m lazy 😄 I tested only the minimum features! I built these tests primarly to show you how I faced and solved common async test problems. I used Junit5 together with vertx-junit5 and testcontainers to spin up Redis and PostgreSQL.
Stay tuned for next chapter! And give me feedback about this tutorial!
]]>vertx-junit5
library. I like the new async assertion APIs of vertx-junit5
, but I feel very unconfortable using VertxTestContext.succeding(Handler)
when I need to run sequentially different async tasks. With this method, your code rapidly grows in a big callback hell! Plus the interfaces I wanted to test are all in Future
s style more than callback style.In this post I’m going to explain you two methods I’ve added with a PR that simplify tests with Future
s
assertComplete()
and assertFailure()
The PR adds methods:
Future<T> assertComplete(Future<T> fut)
Future<T> assertFailure(Future<T> fut)
These methods take a future as parameter and register to it the handler that asserts the completion/failure of it. They return a copy of the future you passed as parameter
For example this callback style assertion:
1 | methodThatReturnsAFuture().setHandler(testContext.succeding(result -> { |
Turns into:
1 | testContext.assertComplete(methodThatReturnsAFuture()).setHandler(asyncResult-> { |
Nothing revolutionary, right? To appreciate it let’s look at a more real use case
Let’s say that we want to test an update method of a class that manage some entities in a database. A common flow for this kind of tests is:
Assuming that both raw db client and entity manager has futurized APIs, without these methods, this test translates in 3 nested callbacks. Now you can simplify it like this:
1 | testContext.assertComplete( |
With just one assertComplete()
we assert that all chain of async operations completes without errors. Then I set an handler that does the final assertions before completing the test
Now, let’s assume that you want to do the same test as before but testing a failure of your method. To do it you need to check every single step of future chain:
1 | testContext.assertComplete(rawClient.create(someData)) |
The bad thing of future chains is passing values through the chain. Let’s say that in previous example the exception throwed by update()
method doesn’t return an exception that contains a super handy method like getEntityId()
. But to get the data from db you need the id
of your data instance, so how you can solve it?
You have two ways that really depend on your code style:
If you are a bit more functional, use CompositeFuture.join()
to transform a tuple of Futures (one of them already completed with the value you want to pass through the chain) to a single Future that encapsulates both the previous async operation result and the new result. This method works only when you are in a chain of completed handlers because when a future inside CompositeFuture.join()
fails, the “join future” is not an instance of CompositeFuture
and doesn’t return any information about other joined futures. I prefer to avoid this method, but keep it in mind because you can find it useful sometimes.
If you don’t care about functional stuff, just use old but gold AtomicReference
s:
1 | AtomicReference<String> entityId = new AtomicReference<>(); |
If you have any good tips don’t hesitate to contact me! Happy testing!
]]>This article follows the previous, when I explored how to improve the routing of Vert.x Web, so please check it out before reading this one: Tree vs SkipList routing
Three days ago, looking at my twitter wall, I’ve found a tweet about Eclipse Collections. I’ve found really interesting the performances of EC, so I’ve decided to put it into my benchmark and test it on our use case. I obviously choose the tree as data structure to rewrite the routing process of Vert.x Web, so I have write two variants of my original TreeRouter
:
ECTreeRouter
: A tree that internally uses List implementations of Eclipse Collections.ImmutableECTreeRouter
: A tree that internally uses immutable List implementations of Eclipse Collections. In this case user can’t change the routing tree after routing has started.The second option was a pure experiment: user creates the router and its internal tree doesn’t change during the application execution. In this case you have a simpler implementation and an immutable list (in some cases faster than a mutable one). I’ve also refactored only the SocialNetworkBenchmark
, because we have similar results on ECommerceBenchmark
.
And, as in the previous articles, here comes the graphs:
And in the end the “final test” graph (now it does only random requests, not sequentially):
I have some considerations about these results:
ECTreeRouter
is faster than skip list in /feed
request in “without load” tests! In particular it creates interesting deltas from TreeRouter
when we request constant paths…ECTreeRouter
, in particular in “without load” benchmarks. We have too few datas to assert that at deeper levels ECTreeRouter
drops its performances or aligns it with TreeRouter
ECTreeRouter
, without any doubt, in the random requests test is faster than TreeRouter
ImmutableECTreeRouter
is a failed experiment 😥Unlike the previous post I’m little hesitant to give a verdict, but we have promising results with Eclipse Collections, so I want to start with it. In case we experience “not so good” performances after the implementation, migrate back to JDK’s collections doesn’t appear a complicated task
Stay tuned for other updates!
]]>Routing consists in calling the correct handler for the URL that user requested. Sometimes this can be a simple and fast process, but in modern scenarios most times this process slows your application, in particular when:
I’m writing this article because I want to implement a tree router inside Vert.x Web framework, so I’m investigating around to find what the best solution would be.
A Route is a combination of HTTP method and path. The path can be a simple constant path or a path with one or more parameters, managed via regular expressions.
The list routing uses a list to contain all defined routes (in a precise order). When the server receives a request, the router iterates through the list and searches for the routes that match with the received request. This process cannot be a simple list search, because a request can match multiple times. For example: if we have a router that declares
GET "/"
GET "/users"
GET "/users/userA"
and we receive /users/userA
as request, the router has to run all the handlers of these three routes.
The tree routing differs from list routing for one simple thing: the routes are inside a tree. So when the router receives the request, it follows the tree searching for matching routes
When you think about a website (or, in the same situation, a web API) you think about a tree of web pages (operations) you can retrieve (perform). But most of the web frameworks don’t implement the routing as a tree of resources, for multiple reasons:
But, not considering these problems, the tree seems a better solution for this problem, right? This is the starting thesis, now I need to prove it.
Before starting, I want to underline that some frameworks have succesfully implemented the tree routing, for example Fastify, achieving really interesting performances
The first step is creating the sketches of these two routing mechanisms. I’ve tried to create the list routing similar to Vert.x Web router, but of course these are only simplified examples. The router of a web framework is more complex than my 50 lines of code. The list router is implemented inside class ListRouter
and tree router is implemented inside class TreeRouter
.
The list router has a simple loop that calls for every route the function route()
; when this function returns true, the route matches perfectly and the routing process stops. Remember that when I check if route matches (both in tree and list scenario) the router:
lookingAt()
method, while in string paths it calls the method startsWith()
)matches()
and equals()
). If the path matches totally, the routing stopsThe tree routing is a simple recursive function that works as follow:
We test against path chunks for a simple reason: when we go deeper with recursion we don’t need to test against previous path components (and we don’t need to re-extract the parameters), so the router simply removes it from the requested URL. And of course when the string is empty we have finished the routing. To gain good performances inside tree nodes I used the skip lists (I know I’ve cheated) to contain associated routes.
This is only a way to implement the tree routing and also remember that I haven’t written the insertion algorithm for the tree router, so I do all association between nodes manually.
I’ve created two benchmarks: an example of ecommerce API and a social network API. This examples are really similar, they only differ in number of routes and how many regular expression are contained in said routes. Below you can see how this “fake” routers are composed.
The first benchmarks I wrote are simple accesses to routes. I wrote one benchmark for every route (that I store in compatiblePaths
) and every data structure. Below you can find results of ECommerceBenchmark
:
The first observation is that the constant paths in skip list are faster than in the tree router. This is caused by skip list optimization: when we get the same elements multiple times the skip list optimizes its links to access more quickly to its values. But the performances for skip lists falls in favor of tree when we use regular expressions, because of course we give a smaller string to the regular expression engine. With the /health
path we have a little difference because in tree we are at the first level, while in /user/newUser
we are one level deeper than /health
. This results are confirmed by the SocialNetworkBenchmark
with the same configuration:
So maybe skip lists are so fast that trees are not competitive in this application field? I’ve done two considerations:
/feed
request) the skip list optimization helps a lotConcurrentSkipListSet
To confuse the skip list I’ve created a more real scenario: The benchmark function does 10 random requests and then the request assigned. This process complicates things a bit for the skip list, because it loses the optimization:
And of course it’s a win for the tree. The fun fact is that tree defeats the skip list also in first paths.
For the social benchmark the random function that chooses the 10 requests is little bit hacky: Some paths (for example the /feed
) have more chances than other ones. But the results remain the same:
The results on SocialNetworkBenchmark
are impressive because with some paths we have 3x or more performances for tree router, but we have an unstable situation at the same level.
There’s also an important consideration to do: When we go deeper, tree performances slope down, so to write a good tree router we need a good combination of access optimizations and insertion algorithm that avoids creating uselessly deep nodes.
You can find below the final results with and without load (“with load” values conveniently scaled x11):
For the two test cases and data structures I also wrote a final benchmark that accesses to compatiblePaths
sequentially and in both cases it’s a huge win for tree:
But this is not a very realistic situation, because usually we have a situation like the social network benchmark with load: we have more frequent requests and less frequent requests, but it’s unusual to get requests ordered in the router order sequentially.
That’s an hard question, because these examples don’t prove a lot. But, according to this data, it makes sense to start developing a tree router because we have good preconditions. In some situations with regular expressions we have seen up to 2x performances thanks to the tree router, but it’s important to get good performances also with constant paths (remember that when we have query parameters like /user?q=blabla
these URLs are splitted at the start of routing and the router treats this requests like constant paths).
The insertion algorithm is the most important challange for different reasons:
The idea of insertion is not splitting for every /
(like I’ve done in my examples) but something more like this:
For example:
Path inserted | Tree update |
---|---|
Empty root node | |
/users/{user_id} | Root node with assigned “/users/{user_id}” |
/users/addUser | Root node assigned with “/users/” and with childs “{user_id}” and “addUser” |
/users/addFacebookUser | “addUser” splitted in new node with “add” and childs “User” and “FacebookUser” |
The last task in particular is very tricky, because a simple char to char comparison is very limiting and also can generate not working regular expressions. For example: path /([a-b]{0, 9})
and path /([a-z]{0, 9})
cannot be splitted creating a parent node with /([a
, because of course this regular expression is invalid. I’ve got some ideas about it:
/users/{user_id}/feed
and /users/{user_id}/events
we split it into /users/{user_id}/
with childs feed
and events
. This can be done with some regular expressions/
(not inside a group).To do these things maybe a library that helps “understanding” regular expressions could come in handy.
I really don’t have idea how 😄. I want to start creating a simple router that does only the minimal routing and then I add conditions necessary to successfully pass the tests. Maybe operating in this way I can avoid creation of useless code.
Router Tree is possible and can give great performances to Vert.x Web. I cannot wait to start working on it!
Stay tuned!
]]>slush-vertx
is a project created by Paulo Lopes born to simplify build tools configurations for Vert.x. I totally refactored slush-vertx
to create a multi purpose code generator for simplify various configurations of Vert.x powered projects.When I designed the new slush-vertx, I tried to create a Vert.x project generator for every configuration needed, not only an OpenAPI 3 server or OpenAPI 3 client. Another important variable of my project is create a generator that generates code for different languages and different build tools.
Now slush-vertx
It’s like a “code generation hub”: It contains a set of project generators, based on what type of Vert.x project are you going to scaffold. At the moment I’m writing this post, slush-vertx contains:
I hope it will grow in the future, creating a tool that can help people to connect with Vert.x world.
slush-vertx is metadata and template driven code generator. This means:
If you want a complete scenario of behaviours of slush-vertx, give a look at this wiki page
But, why that complexity behind a code generator? I mean, It’s only a code generator! Yes, It’s only a code generator, but I wanted to create a tool simple to extend with new generators routines, giving to Eclipse Vert.x a powerful tool.
So, if you have a powerful build tool that generates pretty everything you want, why don’t take advantage of it doing things that you don’t want to do? And this is what I’ve done! Copy-pasting code from other generators I’ve created, I builded a unit test generator for vertx-web-api-contract-openapi
. This generator takes all operations declared in this oas 3 spec and generates a specific test to validate the correct parsing of parameters on server side. This is the final result: OpenAPI3ParametersUnitTest.java
. This unit tests helped me a lot to complete the vertx-web-api-contract-openapi
module.
With some small changes this can be a complete server libraries/frameworks compatibility test tool for OpenAPI 3
Now use it! Follow the readme inside GitHub repository to install and start using it.
You can also contribute to this project adding new generators and updating existing ones with new languages
]]>vertx-web-api-contract-openapi
, and most classes extends/subclass from interfaces/classes inside maven package vertx-web-api-contract-common
(the package designed to contain all API Specs standards common classes). Most important interfaces of vertx-web-api-contract-openapi
are:OpenAPI3ValidationHandler
class that fills BaseValidationHandler
mapsOpenAPI3RouterFactory
, the interface that enable users create a router with your API specAs I said in a previous blog post, OpenAPI 3 added a lot of new things, in particular about serialization styles and complex form bodies (url encoded and multipart). So when I started working on OpenAPI 3 requests validations, I had to add a lot of things to validation framework that I haven’t expected before.
OpenAPI3ValidationHandler
is an interface extension of HTTPOperationRequestValidationHandler
(located inside vertx-web-api-contract-common
), that is an interface extension of ValidationHandler
. This class contains all methods to elaborate the Operation
object (Java representation of OAS3 operation object) and the list of Parameter
objects (Java representation of OAS3 parameter object).
When constructed, it generates all ParameterValidationRule
and ParameterTypeValidator
it needs: in fact, It doesn’t elaborate the api spec nor work with api spec Java models during the validation. It does everything when It’s constructed, so It iterates through various parameters and It generates objects needed for validation.
To give a quick explanation of how this class elaborates parameters:
allowReserved: true
are not supported)Behind the scenes all the validation work is done by validation framework
The router factory is intended to give the most simple user interface to generate a router based on an API Spec. In fact, it provides this functionalities:
mountOperationsWithoutHandlers(boolean)
)ValidationException
failure handler (can be enabled/disabled with enableValidationFailureHandler()
and manually configured with setValidationFailureHandler()
)matrix
and label
style unsupported natively from Vert.x)Router
is done only when you call getRouter()
It’s usual to run into problems regards route declaration order. For example if you declare two routes in this order:
GET /hello/{parameter}
GET /hello/world
With actual Vert.x Router
implementation, /hello/world
handler will never called, unless you explicitly call RoutingContext#next()
inside /hello/{parameter}
handler (that causes Router
to run the next route matching the pattern). With lazy methods It’s guaranteed that routes will be loaded with order declared inside API specification.
I choose lazy methods also for code style reasons, It helps a lot to manage the code of router factory.
With this tools, user can bring OpenAPI 3 power to its Vert.x server implementation as simple as:
1 | OpenAPI3RouterFactory.createRouterFactoryFromFile(this.vertx, "src/main/resources/petstore.yaml", ar -> { |
Next time I’m going to introduce you slush-vertx
, a new generator for Vert.x project. Stay tuned!
The validation framework is located inside maven module vertx-web
and package io.vertx.ext.web.validation
. Following the Vert.x rules, there are Java interfaces for polyglot vertx-web interface and classes inside io.vertx.ext.web.validation.impl
that implements the logic of the validation.
HTTPRequestValidationHandler
and OpenAPI3RequestValidationHandler
(request validator for OAS3) subclass BaseValidationHandler
, the base class for validation. This class contains a map with parameter names as keys and ParameterValidationRule
instances as values for every parameter location (query, path, header, cookie, form inside body). Every ParameterValidationRule
contains a ParameterTypeValidation
. To simplify things:
BaseValidationHandler
validates the request, in fact it iterates through parameters and calls ParameterValidationRule
methodsParameterValidationRule
abstracts a parameter and validates if parameter exists, if can be empty, …ParameterTypeValidator
abstracts the parameter type and validates the typeEvery exceptions of validation framework are encapsulated inside ValidationException
class.
Most important part of validation is type validation. Type validation take a string or a list of strings as input and gives the parameter correctly parsed as output. I’ve built a rich set of type validators (mostly to support OpenAPI 3 parameter types):
NumericTypeValidator
to validate integers and floating point valuesStringTypeValidator
to validate strings against a patternBooleanTypeValidator
to validate booleansJsonTypeValidator
and XMLTypeValidator
to validate json and xml against a schemaEnumTypeValidator
to validate enumsObjectTypeValidator
and ArrayTypeValidator
to validate objects and arrayAnyOfTypeValidator
and OneOfTypeValidator
to validate json schema like anyOf
and oneOf
To instance this classes, there are static methods inside ParameterTypeValidator
. Of course, user can subclass ParameterTypeValidator
to create its custom type validator.
I’ve also created a set of prebuilt instances of this type validators inside ParameterType
enum, with some common patterns like hostname, email, …
After type validation parameter is parsed and then encapsulated in an object called RequestParameter
. Every object is mapped into equivalent language type, for example: if we declare a parameter as integer, we receive (in Java) Integer
object.
When user wants to handle parameters, he can retrieve the RequestParameters
from RoutingContext
. RequestParameters
encapsulate all RequestParameter
objects filtered by location. For example:
1 | router.get("/awesomePath") |
User can declare arrays and objects as parameters. The ObjectTypeValidator
/ArrayTypeValidator
provides the deserialization from string, the validation of objects fields/array items with “nested” validators and the encapsulation inside map/list. For example, you can declare a query parameter as comma separated array of integers like this one: ?q=1,2,3,4,5
and you will receive as result a List<Integer>
.
The serialization methods are implemented as subclasses of ContainerDeserializer
and there are some prebuilt instances in enum ContainerSerializationStyle
. Of course, user can use static methods inside ObjectTypeValidator.ObjectTypeValidatorFactory
and ArrayTypeValidator.ArrayTypeValidatorFactory
to build this validators, define its serialization style and add the “nested” validators.
HTTPRequestValidationHandler
To start validate the requests, developers can use the HTTPRequestValidationHandler
. This class exposes methods to add validators without care about ParameterValidationRule
, because they are automatically generated. For every parameter location HTTPRequestValidationHandler
exposes three methods:
add*Param
: to add a parameter with type taken from ParameterType
enumadd*ParamWithPattern
: to add a string parameter with a patternadd*ParamWithCustomTypeValidator
: to add a parameter with an instance of ParameterTypeValidator
Then there are methods for body, like addJsonBodySchema
or addMultipartRequiredFile
Next time I’m going to introduce you the OAS 3 Router Factory, stay tuned!
]]>Mixing the two things I said, Google Summer of Code is perfect for me!
As my project page says, my object is implement the api design driven development techniques inside vertx-web. Actually the first idea was to implement OpenAPI 2 (fka Swagger) and RAML, then this happened.
With OpenAPI 3 at horizon we decided to focus on it, because, as for Swagger 2, OpenAPI 3 have converters from old specification versions to newest one. As my mentor says, we are pioneers of OAS 3 😄. For OpenAPI 3 parsing, we decided to use Kaizen-OpenAPI-Parser and I often helped this new project with pull requests.
But I kept the idea to abstract as much as possible the router factory and the validation methods to enable future implementations of new api specification standards. I’ve also created an interface for users to validate HTTP requests without writing an api spec. Also, to complete my work, I wrote a lot of unit tests
At the end of implementation of validation and router factory for OAS 3, I wrote a lot of documentation and also a blog post on Eclipse Vert.x blog
After the first evaluation phase, I refactored the code splitting it in different maven packages. Then I started focusing on code generation: swagger-codegen doesn’t support for now OAS 3, and they don’t know when they will release OAS 3 support, so we decided to create our generator. Basing on my mentor’s project, I started creating the code generator. I’ve done huge changes from original project, to enable it to add different generators to same project and support different languages and package managers. I’ve also take advantage of this work to generate new unit tests for OAS 3 router factory and validation (with a tremendous first run 😢).
I’ve also done other pull requests to vertx-web complementary to my work. I added a method to get query parameters and I enabled Route
object to contain multiple handlers (like Express middlewares).
Before the end of the summer, I will give to Vert.x a set of classes for HTTP requests validation, OAS 3 support and a generator multi-purpose simple to extend with new templates.
In the next articles I’m going to discuss about all technical things about my project, stay tuned!
]]>One of the major changes is that body parameters (forms, json, …) are moved to a new object called RequestBody
. So now Parameter
supports only request parameters in
:
header
query
path
cookie
schema
is better than type
In OpenAPI 2 a parameter is defined as:
1 | parameters: |
In OpenAPI 3 you can find the same parameter as:
1 | parameters: |
The major difference is that now you have to define a schema
for every single parameter, even the most simple.
It seems annoying, but It gives some interesting opportunities. For example: A model identifier has a particular regular expression (format
) that describes it and you want to write CRUD methods for this model. This is what you have to do in OpenAPI 2:
1 | # get path |
Now with OpenAPI 3 you can define this single string as a schema and reference to it where you want:
1 | # define your identifier schema in components/schema |
You can also reuse it to define complete model:
1 | components: |
This is tricky, but actually it’s possible. With Schema object support you can define object as query, header, cookie, path parameter. This is an example:
1 | parameters: |
I will explain later how to submit this type of requests
One interesting usage is when you have multi-dimensional key for a model, for example a geolocation model.
style
is the new name of collectionFormat
field. But It isn’t only a name change. Also style
is supported by another field: exploded
. This is the comparison table with OpenAPI 2
style | explode | OpenAPI 2 collectionFormat |
---|---|---|
matrix | false | not supported |
matrix | true | not supported |
label | false | not supported |
label | true | not supported |
form | false | csv |
form | true | multi |
simple | false | csv |
simple | true | csv (not for object) |
spaceDelimited | false | ssv |
pipeDelimited | false | pipes |
deepObject | true | not supported |
{: .table} |
For more informations about how to use this two new fields, check out this table.
content
If you think schema
isn’t enough, check out content
field. I will explain this further when I will cover RequestBody
query
parameters if you need to pass arrays to operation and use form
styleRequestBody
, or split it in different primitive parameters.schema
inside Parameter
object. In particular use it to define object identifiersDesign Driven Development is the name of technique which consists in: write the Web API spec before start writing any other code. Then, when you start writing code, you already have implemented a “dictionary” of server-client interface, so front-end and back-end developers go straight to implement logic of application. If you choose the right tools, you don’t have to care about:
The flexibility of Web API specifications enables this to be used for a lot of use cases:
Both client and server can be linked to Web API Spec through two different approches:
Static code generation is the most simple to achieve, you can find a code generation library for every language/framework/api specification standard combination. Depending on your project and on library you use it can be useful approach or not. It’s a really interesting approach on client-side, but it lacks of flexibility in server side. Most code generators on server side create a stub of server code, with all validation and security routines generated directly from code generator. It can be really useful when you write an Web API Spec that you assert that will wont’t change during development of back-end. One important feature of static code generation is represented by performances, so I prefer this for a web service project
Dynamic code generation is the most interesting for server side of small project. You can change your spec every time you want, and server will generate new validation flow. It’s interesting when you write a client-server complete stack, and you change spec during development.
Do you want to write a magic mobile/web application with a couple of your friends that do something astonishing? Use this tools:
"hello world" + name
.I’ve practiced hello world in different languages:
And others… And I want to practice it in a lot of other different languages, because, as you probably know, in IT world we never stop learning.
Javascript is my favourite language. You can use this language for everything you want: desktop app, mobile app, web app, backend, … It has like an infinite catalogue of libraries for every need! I’m a little bit experienced with Node.JS/Express stack, and a lot of related libraries. I’m also experienced with OpenAPI specification, and all related tools (Swagger Editor, swaggerize-express, …).
For me Java is like big brother of Javascript. But if you love Javascript, you love Java too. I know little bit of Java-EE, Android development and I’m learning right now Eclipse Vert.x for GSoC 2017
So, don’t get surprised if i’m going to talk a lot about Java and Javascript!
I’ve got a small experience with C++ and Python, but I’m going to improve my skills. This languages are a must have in a developer portfolio!
I’m going to write a blog because I want to document my experiences and share my projects. And also, I think it will be funny!
Stay tuned!
]]>