Visiting the topology for running containerized workloads and AI Workbench

Data science and AI projects are typically run on rented CPU/GPU resources or in self-managed data centers. By using containerization, we can separate our code and configuration from the underlying hardware, making it easier to switch between different systems for cost or performance reasons.

However, containerization and remote execution introduce new networking and connectivity challenges. In this talk, we will explore how to manage connectivity for both the control plane and the user experience plane, including requirements like web access.


AI transcription of the talk

This section contains Gemini's rewrite of the YouTube Video's transcript. The raw text is down below. 

The Role of Containerization in Local and Remote Data Science

When designing a GPU-bound data exploration or analytics environment, the key to maintaining consistency—whether you're working locally or remotely—is a containerized workflow. For quick prototyping, we often leverage a few local GPUs without incurring rental costs, using a Docker runtime to host our stack. This stack typically includes API service containers for the heavy lifting and a Web container (like a Jupyter Notebook or a custom UI like those from NVIDIA AI Workbench) for user interaction. Accessing this environment simply requires a web browser to connect to the Web container. When it's time to scale up to remote, more powerful GPUs, the architecture remains nearly identical: we still run the multi-container environment, often managed by Docker Compose, with the critical addition of an ingress proxy (or ingest proxy). This proxy secures and manages external access to the Web container, allowing us to seamlessly switch from working on a local machine to connecting to multiple remote hosts without changing the core application structure.




Anatomy of a Remote Environment

When you leverage cloud-based platforms or managed services (like those offered by AWS, Google, or specialized providers like Brev), the core architecture relies on an efficient, web-first approach. Instead of the cumbersome process of manually installing software on a remote server, we use a containerized environment—the smart, modern solution for custom analytics. Your session starts by spinning up this environment on a powerful remote server. From your perspective, your local web browser connects to a cloud vendor proxy. This proxy securely tunnels through to an ingress proxy managing the remote compute instance. This grants you seamless access to the various containers within the environment, which could include databases, compute services, and your main web application (like a Jupyter session). Crucially, the server's GPUs are directly exposed into this containerized environment, allowing your analytic code to achieve high-performance acceleration. This clean separation allows you to run concurrent experiments on different servers, accessing each via a separate browser tab, all while the vendor's proxy handles the complex, secure tunneling transparently.




Deep Dive: NVIDIA AI Workbench and Secure Remote Execution

Understanding how a system like NVIDIA AI Workbench manages remote GPU resources provides a clear blueprint for secure, user-friendly remote execution. The architecture cleverly minimizes the need for complex cloud infrastructure by leveraging SSH. When you decide to run a project remotely, the local AI Workbench command-line interface (CLI) initiates an SSH connection to the server and installs the Workbench service, establishing a control plane on the remote machine. This control plane then acts as the remote execution engine. For a specific project (e.g., Project One), the engine uses commands over SSH to clone the project repository (e.g., from GitHub) and execute the container commands (like docker-compose) right on the remote server, effectively launching the required containerized environment. To provide secure access, two key elements are deployed: a Traffic Proxy container (which serves as the ingress point) and a temporary SSH tunnel/bridge created between your local machine and the remote service. This networking "magic" allows your local web browser to securely tunnel through to the Traffic Proxy, which then routes your session directly to the Jupyter Notebook server running inside the remote container. The result is a seamless local web experience backed by the power of remote GPUs.




Maximizing Utilization: Running Multiple Projects and Services

The beauty of this containerized, SSH-based approach is that it easily supports multi-tenancy on a single remote machine, significantly improving your hardware utilization. Once the Workbench service is installed on your rented server, the CLI can run the necessary Docker commands to deploy multiple, isolated projects on that same box. For instance, on a machine with four GPUs, you could run a long-running model training experiment in one project while using a second project for data cleaning or editing. To manage access, a single Traffic Proxy container runs on the machine and acts as the central ingress point for all project containers. Moreover, each project can be complex; running docker-compose within a project container allows you to spin up additional services like a relational database or a vector store, all managed under that project's umbrella. This powerful structure means you can have multiple sessions open simultaneously—perhaps two separate Jupyter Notebooks, one for each project, and several browser tabs dedicated to custom web applications running within those projects. This extends to powerful development tools: by spinning up a VS Code Server inside a project container and setting up an SSH tunnel, you can run your local VS Code application, connecting directly to the remote server's file system and compute, making remote development feel truly local.




Scaling Development: Managing Multiple Remote Servers

The true power of this standardized, containerized approach reveals itself when you need to scale beyond a single host. Instead of having one server running multiple projects, we can expand to managing multiple remote servers, each potentially running its own set of applications. For example, a developer might rent two remote servers: one dedicated to a single, high-priority model training task, and the second running two smaller, concurrent data exploration applications. This architecture is unified by the consistent use of SSH tunneling. For every remote machine running a Workbench service, we establish a secure SSH bridge from the local developer machine. Each server’s containerized environment runs its own Traffic Proxy container (the ingress proxy), which is responsible for securely fronting all the web applications and services running on that specific machine. This setup allows the developer to seamlessly open local browser sessions to all applications across all remote servers. While solutions like NVIDIA AI Workbench automate this entire process, any custom, scalable GPU environment—whether using cloud proxies or direct data center network bridges—will employ a variation of this fundamental SSH-tunneling and ingress-proxy pattern to ensure secure, ubiquitous access.




Summary: A Unified Approach to GPU Compute

In summary, we've outlined an architectural blueprint for running GPU-bound data science and analytics environments. The core of this strategy is a consistent, containerized environment (often managed with Docker Compose) that ensures code portability. Whether you're prototyping quickly on your local machine or scaling up to powerful remote servers, the user experience remains seamless. This is made possible by securely accessing the environment's services—including Jupyter Notebooks, custom web applications, and remote development tools like VS Code Server—through the use of SSH tunneling and dedicated Ingress Proxy containers. By adopting this unified approach, you can efficiently leverage vast remote GPU resources while maintaining the local feel and isolation necessary for cutting-edge development.




Video






The actual transcription from the talk minus the "ums"

This will be a moderately buzzworthy compliant talk. So I'm going to say things like AL, AI, ML, artificial intelligence, large language model, containerized environment, remote servers, GPU, VRAM. All right, buzzwords over. 

All right, so what we're going to talk about today is how this goes together when you want to run a data exploration or even an analytic environment that is GPU-bound. And we're either going to run it locally or we're going to run remotely. And what I'm going to assume is, you know, sometimes we might run stuff on our local machine. I have a couple of GPUs in my machine. I want to play with what I or get a feel for how this is going to work, without renting time. And in that case, what I have or what we're going to talk about and assume is that we're going to run locally, like we run remote. We're going to have a Docker container. That Docke,r I'm sorry, a Docker runtime or a containerized environment. And that will have containers of different types. Some will be API service containers, some of them will be web containers. And what we're assuming in this case, is that a web browser can be used through some mechanism to get access to the web container. Think Jupyter notebook, right? Or it could just be an app, right? If you run any of the AI Workbench, Nvidia AI Workbench ones, a lot of times they'll have some custom UI that runs. It runs in a web container and then we have some way to get access to that web container from our browser, right? And then if we want to set up the same thing locally, what do we end up with? We end up with more remote GPUs than we had local, probably because it's expensive. And then we run a containerized environment. Probably the only thing that's a little different, although we may actually do that on our side too, is that we may have an ingest proxy. But basically, the idea is we've got some API containers and a web container that talks to those API containers. It may have been managed with Docker Compose. And then we've got an ingress proxy that lets us get access to the web container and potentially some other stuff. Right? So in this case, I've got a local machine and then I've got two remote machines that I might want to talk to. 

Let's talk about what that looks like. Typically like if you do one of the studio products from AWS, Google, Amazon, you do the brev thing. There are others that I haven't used. What we're going to do is we're going to bring up it's a all web environment. You spin up. you have you get access to some server you run time on the server that actually spins up a containerized environment if you're smart because it makes it easy to create a custom environment. The other option would be to install software on the remote server all the time and that's painful. So in this case, we're going to assume that we did a bunch of this in a containerized environment. And what happens is the web browser goes through a cloud proxy created by the vendor that goes to the ingress proxy and that gives us access to the containers that are in the environment and those containers in the environment can talk to each other and they might be database compute. We get access to the GPUs on that server because the GPUs are exposed into the containerized environment. So in this case, I actually have two web browsers open to two different servers that I could be running two different sets of experiments on. And that's all transparent to me sort of through this cloud vendor tunnel proxy. Any questions? No questions because nobody's here but me. 

So if we look at how Nvidia does this for the AI workbench, it gives you a feel for a little bit of the complexity to make this work. What they assume is that you have access to a remote server with SSH. They bootstrap the control plane of this on the remote server. So I had run in an NVIDIA AI workbench that we that installs the CLI. When I decide I want to spin up or run something on a remote server, it actually SSH is into the remote server and installs the workbench service. Now we actually have a control plane running on the remote server. And if that conserver's got you know containers running then that we can proxy sort of tunnel through that SSH to that remote service and start issuing docker commands on that box right so that way we don't need really complicated infrastructure in this space so this is sort of like hey I want to set up my own environment because of the com the way the GPUs work you can do like the vGPU thing but the reality is in a lot of cases you're going to buy the whole server and use all the GPU use on it or at least it'll look that way to you because it's a VM. So that's what they do. They run this CLI. It runs installs a workbench service and workbench now becomes the remote execution engine. When I spin up a project in AI workbench, it actually you can actually tell it to the over the command line through SSH. It'll actually clone down the project from GitHub or somewhere else. and then it will actually run the docker commands like docker compose on that machine and install basically run the containers on that box. So in this case if I then spin up a Jupyter notebook in that containerized environment. So I ran this the UI on my local machine it did the SSH to the remote. The remote had it spin up a containerized environment for me for my workbench. I'm going to call it project one here. When it did that, it also loaded a traffic proxy container that does forwarding and that is actually the ingress container we talked about. So in this case the project container runs a Jupyter notebook. There's a traffic proxy running and what they do is they actually set up an SSH tunnel. I called it a bridge here. I don't know what I was thinking. Um, so it's a bridge tunnel and basically what they do is they set up an SSH tunnel from your local machine and this calls for a bit of networking magic to happen. So it works in specific environments. Although if you're running it in your personal home lab like me then it's no problem because the networking is super easy to go back and forth. But in the cloud you got to do routing to get into these private networks. In my home network, everybody's up here, right? So in this case like if I was going to a remote cloud data center we would set up an SSH bridge and actually Nvidia does this for you. And now I can open a browser based on a path and when I do that I actually end up pointed at the traffic proxy. The traffic proxy actually points at the Jupyter notebook server. Now I'm running a local web browser that has a session open on a Jupyter notebook in a containerized environment that we set up remote. Right? Pretty straightforward. Once you think about it. Okay. 

Well, it turns out you can actually do this for multiple projects, right? So again, in this case, I've actually got two projects running on my rented machine. So I rented one with four GPUs and maybe I'm running some experiments in one and in another project I'm like doing editing and stuff and then when the one's done, I'll run the you don't know. So in this case, I've got two applications, two projects running on my remote server, and I'll show you in a minute. This could be NM servers. We already have the workbench service installed on that box, right, when we spun it up. That CLI can run the Docker commands to deploy multiple projects onto that box. So once I have a box, I can increase the utilization on that thing. we end up with a traffic proxy container running on that machine. And that acts as the proxy for all the project containers and what else? That's it. Right? If you look at the what I've got here, we've got two project containers and then one of those project containers actually spun up other container images. Right? So if I needed a database or I needed a vector database or relational database or I needed some other kind of compute environment, right? when we bring up the pro proxy project through Nvidia AI workbench it you can actually tell it go run docker compose and that'll spin up all the other containers. So in this case I actually ran two different projects. One of those projects actually spoke up spun up a couple extra containers. In this case I've got two a Jupyter notebook into each of those and each of those actually had a web app and one of them's got a couple web apps. Um, and really those are just web servers running there, but I can get access to those because we're proxied through the traffic proxy container. In this case, I've can have like three or four browsers open. One of those would be to the Jupyter notebook and then the others are to different apps or I could have two Jupyter notebooks open, one to each of those project environments. The only other thing I added in here is VS Code. So in the case of because VS Code can run in a dev container, right? They actually have a VS Code server. Um, when we spin up the project, we could spin up a VS Code server and then if we can get an SSH tunnel set up, we can run VS Code locally and it will tunnel down into that VS Code server. Because that project container's actually got a disc drive with all my stuff that got cloned into it, um, I can then do stuff on that VS Code server. So that's what that looks like, right? So this is a little more complicated example where I rented a machine but I'm actually running two separate projects and an arbitrary number of web apps. 

It gets way more complicated than that, and this is small enough on deliberate because I don't want to explain it in depth. In this case, I've actually got a developer machine, and I rented two remote servers. Right? So before I had one server running two apps. Now I've got two remote servers. One of them is running one app and one of them's running two apps. and the bridging this type of tunneling that gets set up. You set up the SSH tunnel. What that means is we can actually open browsers locally to all of those. In this case, you can you can't read it, but you can assume those are traffic proxies running on each of those containerized environments that actually front for all of the web apps that are running on those machines. So, we set this up with SSH. We have the SSH ability into every remote sheet machine that we run a workbench on, and then we have the ability through the ingestion ingress proxy to be able to hook up to the web apps. Nvidia AI Workbench does it this way. If you run anybody else like this, it would be a version of this, either using the cloud proxies or into your own data center where you have direct network connectivity or they have a wider bridge, and that's it. 

So that's what I got. If you wanted to know how you can use Jupyter notebooks and other web apps and VS Code remote into remote servers to do your data science and get access to GPUs. Have a great day.


Revision History

Transcribed 2025/10

Comments

Popular posts from this blog

Installing the RNDIS driver on Windows 11 to use USB Raspberry Pi as network attached

Understanding your WSL2 RAM and swap - Changing the default 50%-25%

Get AI code assist VSCode with local LLMs using LM Studio and the Continue.dev extension - Windows