"Captain's Log, Stardate 2022.11. The GISKARD is establishing a base on the Notebook planet. We are building a communication bridge between the Python and Java civilizations.”
During the past two months, we’ve been working on our new release, Giskard 1.3. This release resolves the main difficulties that our users had. One of the most important ones we’ve identified is the onboarding process, resulting in a high time to value.
In this article, we explain how we solved this issue related to how to execute AI / ML models by introducing a novel bridge architecture between Python and Java.
This article assumes some level of familiarity with gRPC and protocol buffers. Please refer to this documentation for a gentle introduction to these technologies.
🤔 How is Giskard dealing with code environments when executing models?
Since Giskard ensures the quality of machine learning models, it comes late in the model creation process - a data scientist should already have a model to test. We need to upload the model to an environment where Giskard can execute it. Ideally, this environment should be identical to the one data scientists used to create the model.
In the previous versions of Giskard, we introduced an “ML Worker” component. It’s a python process that lives inside its container and starts upon Giskard’s installation. Whenever the Giskard backend (in Java) needs to execute something ML-related, it contacts an ML Worker via gRPC to invoke methods like “predict” or “explain” or use bidirectional streaming to exchange data. Once the ML Worker executes a model, it sends the results back to the backend, which transfers it to the frontend of a user.
This setup allowed us to take control of the ML Worker creation, so once GIskard is installed, you can be sure that an ML Worker is running. However, the technical inconvenience is that a data scientist now has to configure two environments:
- The one where he creates a model - often, a notebook
- The one inside the ML Worker container
And what’s worse is that whenever a data scientist needs to use a new library, he needs to make changes to the ML Worker dockerfile and rebuild the image. It can take time and require software engineering knowledge.
To solve this problem, we came up with the idea of an “External ML Worker”.
🌉 External ML Worker: a new bridge architecture based on gRPC
Instead of configuring a new environment on the Giskard side, why not just execute the model in the data scientist’s existing environment?
In this case, the Giskard server must connect to user environments. But unlike the Giskard server, a user machine is not necessarily accessible from the outside, so Giskard cannot talk to it directly.
Another problem is that gRPC is a client-server protocol, so the party that is a "server" can’t invoke methods on a "client" which is something that we’d need.
A solution to this problem is a reversed communication architecture, illustrated in the architecture diagram below.
The blue rectangle represents the internal worker on the Giskard server side, which is started automatically with docker-compose (what we already had in previous versions).
It’d be nice if, instead of connecting to the internal gRPC server (blue), the Giskard server could connect to a proxy that would transfer all of the requests to a real gRPC server located elsewhere - between orange and green rectangles.
On the right, we have a user machine where a user can execute the following command:
This command will start an internal gRPC server on the client machine (green) and a “bridge” between the Giskard server and this newly created gRPC server.
The bridge will connect to Giskard in 2 different ways:
- Via a “service” channel - for exchanging commands with the Giskard server, identified by an IP address
- Via a “proxy” channel - to proxy gRPC messages as-is between the gRPC proxy server (orange) and the internal gRPC server (green)
First, the bridge will connect to an exposed Giskard TCP socket and send a 1-byte request to create an “internal proxy” on the Giskard server side.
Now, whenever it’s required to execute a model with an external ML Worker, the Giskard server will connect to an internal proxy as if it was an actual gRPC server.
Once there’s a new client connected, a proxy will trigger a client connection request to a gRPC bridge with a unique client id.
After the client connection is received, the bridge will do two things:
- Establish a connection to the internal gRPC server.
- Establish a new “proxy” connection with the Giskard server and send a message containing a client id to bind this bridge connection with an appropriate internal proxy.
After completing this binding, the bridge and internal proxy send raw gRPC data allowing the Giskard server to call methods on an internal gRPC server located on a client machine.
In addition to this External ML Worker feature, we’ve also worked on improving the performance of Giskard. When we invoke the model, it’s cached on the client machine to avoid transferring the model every time. It's now also possible to execute large models because they're sent to client in chunks.
We have also merged new ML tests from our first contributor - Citibeats! In this article, you can read all about our partnership with Citibeats to understand the behavior of NLP models.
🗺 More to come
We’re going to keep improving the model execution mechanism in future releases. The main features in our roadmap related to model execution are:
- Model lifecycle management to avoid re-loading the same model into memory for each inference
- ML Worker security enhancement
- Multiple worker management for users to connect several workers simultaneously.
Stay tuned for more updates, and join our Discord community to leave your feedback.