Kubernetes controllers - The Empire Strikes Back
In this blog post I want to introduce you to some important steps forward we made in our Extending Kubernetes API In-Process project. If you don’t know what I’m talking about, check out the previous post: Kubernetes Controllers - A new hope.
We implemented the high-level ABI to start watchers on Kubernetes resources, as argued in Next steps paragraph. This allows us to identify watch requests sent by controllers and to deduplicate them.
Then we worked on modifying our ABI to make it asynchronous, in order to invoke the controllers only on-demand. Now, we don’t spin up a thread for each module, but we wake controllers up asynchronously when there is a new task to process.
Thanks to these changes, we’re now able to use Rust’s async
/await
inside the module. This allowed us to realign our fork of kube-rs
, bringing back all the original interfaces, and run kube-runtime
inside the modules.
The kube-watch-abi
ABI
The kube-watch-abi
ABI was originally composed by an import and an export:
1 |
|
The watch_req
is a description of the watch to register:
1 | pub(crate) struct WatchRequest { |
Controller
When the module invokes the watch
import, it registers a new watch and returns a watch identifier. This identifier is stored together with a reference to the Stream<WatchEvent>
in a global map.
In similar fashion to the request
ABI discussed in the previous post, we serialize the WatchRequest
data structure in order to pass it to the host.
Then, every time the host has a new WatchEvent
that the controller needs to handle, it will invoke on_event
with the serialized WatchEvent
. Using the watch identifier, the controller will get the associated Stream
from the global map and append the deserialized WatchEvent
to it.
The host
When the controller invokes watch
, the host checks if there is a registered watch for that resource. If there is, then it just registers the invoker as interested to that watch, otherwise it starts a new watch.
Every time a watch receives a new event from Kubernetes, the host resolves all the modules interested to that particular event. For each module, it allocates some module memory to pass the event and it finally wakes the controller up by invoking on_event
.
Rust module with async-await
Callbacks are bad!
The kube-watch-abi
is the de-facto an asynchronous API: watch
starts the asynchronous operation, on_event
notifies on the completion of the operation.
In the initial implementation of the watch
, on module side, I just modified the kube client watch
to provide a callback:
1 | pub fn watch<F: 'static + Fn(WatchEvent<K>) + Send>(&self, lp: &ListParams, version: &str, callback: F) { |
Every time on_event
was invoked, the module resolved the callback from the watch identifier and invoked it.
This was fine as an initial approach, but we soon hit a problem: Porting all of the existing Kubernetes client and runtime code from async
/await
to the more primitive callbacks could have caused a lot of issues, making the fork diverge too much from the upstream code.
How async
/await
works
Here’s a little refresh on async
/await
from Asynchronous Programming in Rust book:
async/.await is Rust’s built-in tool for writing asynchronous functions that look like synchronous code. async transforms a block of code into a state machine that implements a trait called Future. Whereas calling a blocking function in a synchronous method would block the whole thread, blocked Futures will yield control of the thread, allowing other Futures to run._
Although there are a lot of details about how Rust implements the async
/await
feature, these are the relevant concepts for this post:
Future
is the type that represents an asynchronous result, e.g.async fn
returns aFuture
. You can wait for the result of aFuture
using.await
.Stream
is the same asFuture
, but it returns several elements beforeNone
, which notifies the end of the stream.- In order to use
.await
, you must be in anasync
code block. - In order to run an
async
code block, you need to use a task executor.
For example:
1 | use futures::executor::LocalPool; |
In our async
/await
implementation we implemented the Future
and Stream
traits, we reused LocalPool from the futures
crate as a task executor, and we generalized on_event
to notify the asynchronous operation completion.
The controller lifecycle
We first analyzed the controller lifecycle:
- When host invokes
run
, the controller starts a bunch of watchers and then it waits for new events - Every time a new event comes in, the host wakes the controller up by calling
on_event
This means that during the run
phase, the controller starts a bunch of asynchronous tasks, one or more of them waiting for asynchronous events. After the run
phase completes, there is no need to keep the module running. When we wake the controller up again, we need to check for any tasks that can continue and run them up to the point where we have all the tasks waiting for an external event.
To implement this lifecycle, we need to execute both on run
and on on_event
the executor method run_until_stalled()
, which will run all tasks in the executor pool and returns if no more progress can be made on any task. This allows us to implement run
as follows:
1 |
|
Our custom Future
/Stream
In order to encapsulate the pending asynchronous operation, we implemented our Future
. The implementations are straightforward and pretty much the same as explained in the Rust async book:
1 | /// Shared state between the future and the waiting thread |
A similar implementation exists for the Stream
trait. To create a Future
, we use this method:
1 | pub fn start_future(async_operation_id: u64) -> AbiFuture { |
Generalizing on_event
to wakeup_future
/wakeup_stream
At this point, we took the concept of on_event
and generalized to async
/await
, introducing wakeup_future
/wakeup_stream
ABI exports.
Every time the controller invokes an asynchronous ABI import (like watch
), it gets the identifier that we use to instantiate our Future
/Stream
implementation.
When the host completed the asynchronous operation, it invokes wakeup_future
/wakeup_stream
. The controller marks the Future
/Stream
as completed, including the result value, and invokes LocalPool::run_until_stalled()
to wake up tasks waiting for that future/stream to complete.
1 |
|
The complete flow
This is a complete flow of an asynchronous ABI method:
sequenceDiagram participant C as Controller module participant H as Host activate C C ->> H: do_async() activate H H ->> C: Returns async operation identifier C ->> C: run_until_stalled() deactivate C Note over H: Waiting for the async result H ->> C: wakeup_future() deactivate H activate C C ->> C: run_until_stalled() deactivate C
You can find the complete code regarding async
/await
support here: executor.rs
More async ABI methods!
After we implemented async
/await
in our WASM modules, we refactored the request
ABI discussed in our previous post as an asynchronous ABI method:
1 |
|
Now the returned value is the asynchronous operation identifier and, to signal the completion of the request, the host invokes wakeup_future
.
We also included a new ABI method to sleep the execution of the module:
1 |
|
This is necessary to run the kube-runtime, which performs some sleeps before synchronizing the internal cache again.
Redesigning the host as an event-driven application
The first implementation of the new kube-watch-abi
ABI was a little rough: a lot of blocking threads, shared memory across threads, some unsafe sprinkled here and there to make the code compiling.
Because of that, we redesigned the host to transform it in a full asynchronous application made of channels and message handlers. For every asynchronous ABI method there is a channel that delivers the request to a message handler, which processes the request, computes one or more responses and sends them back to another channel. This last channel delivers messages to the AsyncResultDispatcher
, owner of the module instances, that invokes the wakeup_future
/wakeup_stream
of the interested controller.
Today we have 3 different message handlers, one for each async ABI method:
kube_watch::Watchers
that controls the watch operation. This message handler is also able to deduplicate the watch operationshttp::start_request_executor
to execute HTTP requestsdelay::start_delay_executor
to execute delay requests
When the host loads all the modules, it executes the ABI method run
for each module, then it transfers the ownership of module instances to AsyncResultDispatcher
that will start listening for new AsyncResult
messages on its ingress channel.
Because all the message handlers and channels are async
/await
based, if all the handlers are in idle, virtually no resource is wasted with threads waiting.
Since AsyncResultDispatcher
controls all the different module instances, it avoids invoking the same controller in parallel: LocalLoop
is a single threaded async task executor, hence a module cannot process multiple async results in parallel.
Compiling kube-runtime to WASM
Thanks to all the async changes, we managed to realign most of the APIs of kube-rs
to the original ones. This allowed us to port kube-runtime
to our WASM controllers.
The kube_runtime crate contains sets of higher level abstractions on top of the Api and Resource types so that you don’t have to do all the watch/resourceVersion/storage book-keeping yourself.
The problem we experienced with compiling kube-runtime
to WASM is that it depends on tokio::time::DelayQueue
, a queue that yields components up to a specified deadline. DelayQueue
uses the Future
type called Delay
to effectively implement delays. The problem with this Delay
is that it’s implemented using the internal ticker of the Tokio async task executor Runtime
, which we don’t use inside WASM modules.
In order to fix this issue, we had to fork the implementation of tokio::time::DelayQueue
and reimplement the Delay
type using the delay
ABI shown previously:
1 | pub struct Delay { |
With the custom implementation of DelayQueue
, the rust-runtime
compiled successfully to WASM, and we managed to port our controllers to use it!
1 | async fn main() { |
Show me the code!
You can check out the complete code of controllers today here: simple-pod controller
If you want to look at all the different changes the project went through, look at these PRs:
- Event system and implementation of
watch
ABI - Refactor of the host as event-driven application
async
/await
+watch
returnsStream
- Non blocking HTTP proxy ABI
- Non blocking
delay
ABI - Kube runtime port
Next steps
Now the controller module looks like a real Kubernetes controller: the difference is minimal to a controller targeting the usual deployment style. We also opened up a door for important optimizations, thanks to the watch
ABI method. The host refactoring and the async ABI methods should also simplify the future interaction with Golang WASM controllers, because our ABI now resembles the asynchronous semantics of their WASM ABI.
Our next goals are:
- Hack the Golang compiler to fit our ABI (or a similar one). For more info check out the previous blog post
- Perform a comparison in terms of resource utilization between this deployment style using WASM modules and the usual one of 1 container per controller.
- Figure out how to handle the different
ServiceAccount
s per controller
Stay tuned!