NVidia AI Studio Workbench is a containerized ML playground

NVidia AI Studio creates and manages containerized ML environments that isolate ML projects on local and remote machines.  You no longer have to switch environments or remember which version of Python or Anaconda you are using in your global machine environment. NVidia simplifies the initial configuration by providing predefined image definitions containing Python, PyTorch and other tools to be used with or without NVidia Graphics cards.

The actual development is done via browser-based tools like JupyterLabs notebooks. Workbench spins up local proxies that port forward into the development container.



See videos below

NVidia Workbench runs in a WSL instance

NVidia workbench runs in its own WSL instance.  Each project runs in its own Docker container.  You can look at the NVidia main WSL instance by opening a shell into that WSL instance. The following command can be run in a Windows terminal window.
  • wsl -d NVIDIA-Workbench
NVidia projects live in the WSL instance in  
  • /home/workbench/nvidia-workbench/

Creating A New Project

The project creation wizard gathers basic information and then asks which preconfigured environment should be used. You select an environment definition and a GitHub or local repository when you create a workspace.  The Studio builds the container image and populates the source tree from Git or creates a new empty source tree. The NGC catalog contains environment definitions for Python, PyTorch, and RAPIDS all with CUDA support

This will be an empty project with no code. 

There is no way to add AI Workbench support to an existing project at this time (5/2024)

Local Projects in Local Containers

AI Studio can manage multiple projects and their associated environments.  Each project environment has its own container image.  AI Studio builds a project image when the project is created that describes the resources required to build the image.  It then builds the image which can be reused.  An image is built on any machine the project is run/developed on.


A file project browser is opened whenever you double-click on that project.  This is just a view into the local WSL Workbench file system or the remote Linux file system if using a remote server.








You can see the status of the containerized project in the status bar at the bottom of the screen.  The project image may be built or it may be built and running.

Clicking on the green button hydrates the image.  The image is deployed into docker on the target machine, local or remote.





NVidia Workbench opens a browser when the container is running.  The browser is pointed at a local proxy that forwards to the AI Workbench container's exposed port.

Workbench Examples

NVidia hosts workbench examples in the public GitHub repository. Most of these examples will fit on standard video cards. 

Local and Remote Control

The local Windows (in my case) installation of AI Studio manages containerized development environments on local or remote machines. The installation process locally installs Docker or Podman.  It can only communicate with a local development environment if AI Studio Server is installed. Each location can have multiple projects.  Those projects each run in their own container.


Linux Remote Server

A local AI Studio instance can manage an AI Studio instance running remotely on a Linux server. The local AI Studio communicates with a remote service.  You can select local or remote projects using the AI Studio GUI's host dropdown. This capability relies on SSH 
  • You need to have SSH enabled on the remote server. 
  • Actual software development is done using a web browser hitting a local endpoint that is port-forwarded to the remote server.
You need to install AI Studio locally and on the remote Linux machine. Linux server installation is CLI-based. The actual development environment is always containerized. AI Studio installs docker if it is not previously installed on the remote server.  The remote server installation is command-line and text-based.

mkdir -p $HOME/.nvwb/bin && \
curl -L https://workbench.download.nvidia.com/stable/workbench-cli/$(curl -L -s https://workbench.download.nvidia.com/stable/workbench-cli/LATEST)/nvwb-cli-$(uname)-$(uname -m) --output $HOME/.nvwb/bin/nvwb-cli && \
chmod +x $HOME/.nvwb/bin/nvwb-cli && \
sudo -E $HOME/.nvwb/bin/nvwb-cli install

You can watch the installation process in the text GUI

Linux Remote References

  • https://docs.nvidia.com/ai-workbench/user-guide/latest/installation/ubuntu-remote.html
  • https://docs.nvidia.com/ai-workbench/user-guide/latest/reference/locations/remote.html
  • https://docs.docker.com/build/buildkit/#getting-started changes to daemon.json

If you get an error building a remote container related to buildx

The AI Studio builds a docker image every time you create a new environment from a template. That container is your project.  The build process requires buildx which may not be installed on your remote Linux system.  My Ubuntu 22.01 had docker installed but not buildx

sudo apt install docker-buildx 

Version alignment between local  and server

The AI Studio versions on the local machine and on the remote Linux server should be the same.  You may need to update the local AI Studio if you download the latest version of AI Studio to your remote machine.

Opening a remote project

The server-side Linux AI Studio creates a new container for each project you spin up a project environment.  Every Jupyter environment is its own container.  This is true for local or remote projects.  The following image shows the logs for a successful container build.  









That container is then run when you decide to open the project. The screen to the right shows a containerized project.   You can click on it to see the details.

Developers start working on the project by clicking on the green Connect button.  This will open a browser notebook session

Configuration issues when adding remote locations at home

AI Studio expects to be able to reach the remote server by FQDN hostname. It expects there to be a hostname.  Hostname won't work if you just have a bare hostname, like hp-z820 You will need your server's IP address.

I tried to use .local as a domain but that didn't work and instead used the IP address.


Videos




Nerd Details with Studio and SSH

The Studio GUI executes remote commands via SSH.  It talks to the remote server, creates workspaces, or starts the container via SSH. The following is just to exemplify this communication. The local laptop is Powerspec-g708 and the remote Linux server is hp-7820.

idle on the server after ssh into the remote machine.

(base) joe@hp-z820:~$ sudo netstat -atp | grep ssh
tcp     0   0 0.0.0.0:ssh     0.0.0.0:*      LISTEN      1094/sshd: /usr/sbi 
tcp     0   0 hp-z820:60436   hp-z820:ssh    ESTABLISHED 16648/ssh           
tcp     0   0 hp-z820:ssh     hp-z820:60436  ESTABLISHED 16649/sshd: joe [pr 
tcp6    0   0 [::]:ssh        [::]:*         LISTEN      1094/sshd: /usr/sbi 

inbound ssh on the server after opening a Workbench AI Workbench remote

(base) joe@hp-z820:~$ sudo netstat -atp | grep ssh
tcp  0  0 0.0.0.0:ssh       0.0.0.0:*               LISTEN      1094/sshd: /usr/sbi 
tcp  0  0 localhost:47610   localhost:10001         ESTABLISHED 33830/sshd: joe     
tcp  0  0 localhost:40398   localhost:10001         ESTABLISHED 33830/sshd: joe    
tcp  0  0 hp-z820:ssh       Powerspec-g708:61068    ESTABLISHED 33724/sshd: joe [pr 
tcp  0  0 hp-z820:60436     hp-z820:ssh             ESTABLISHED 16648/ssh        
tcp  0  0 hp-z820:ssh       hp-z820:60436           ESTABLISHED 16649/sshd: joe [pr 
tcp  0  0 localhost:47650   localhost:10001         ESTABLISHED 33830/sshd: joe     
tcp  0  0 localhost:47626   localhost:10001         ESTABLISHED 33830/sshd: joe     
tcp  0  0 localhost:47634   localhost:10001         ESTABLISHED 33830/sshd: joe   
tcp  0  0 localhost:40386   localhost:10001         ESTABLISHED 33830/sshd: joe     
tcp6 0  0 [::]:ssh          [::]:*                  LISTEN      1094/sshd: /usr/sbi 

inbounds ssh on server after opening a remote Jupyter notebook via the web browser

(base) joe@hp-z820:~$ sudo netstat -atp | grep ssh
tcp  0  0 0.0.0.0:ssh             0.0.0.0:*               LISTEN      1094/sshd: /usr/sbi 
tcp  0  0 localhost:44384   localhost:webmin        ESTABLISHED 37430/sshd: joe     
tcp  0  0 localhost:56986   localhost:10001         ESTABLISHED 37430/sshd: joe     
tcp  0  0 hp-z820:ssh       Powerspec-g708:61080    ESTABLISHED 37331/sshd: joe [pr 
tcp  0  0 localhost:56990   localhost:10001         ESTABLISHED 37430/sshd: joe     
tcp  0  0 localhost:33990   localhost:10001         ESTABLISHED 37430/sshd: joe     
tcp  0  0 hp-z820:60436     hp-z820:ssh             ESTABLISHED 16648/ssh           
tcp  0  0 localhost:44356   localhost:webmin        ESTABLISHED 37430/sshd: joe     
tcp  0  0 localhost:44404   localhost:webmin        ESTABLISHED 37430/sshd: joe     
tcp  0  0 localhost:33984   localhost:10001         ESTABLISHED 37430/sshd: joe     
tcp  0  0 localhost:44368   localhost:webmin        ESTABLISHED 37430/sshd: joe     
tcp  0  0 localhost:44394   localhost:webmin        ESTABLISHED 37430/sshd: joe     
tcp  0  0 hp-z820:ssh       hp-z820:60436           ESTABLISHED 16649/sshd: joe [pr 
tcp  0  0 localhost:44386   localhost:webmin        ESTABLISHED 37430/sshd: joe    
tcp  0  0 localhost:44362   localhost:webmin        ESTABLISHED 37430/sshd: joe     
tcp  0  0 localhost:57002   localhost:10001         ESTABLISHED 37430/sshd: joe    
tcp6 0  0 [::]:ssh          [::]:*                  LISTEN      1094/sshd: /usr/sbi 

inbound ssh on the server after closing the remote AI Workbench window 

(base) joe@hp-z820:~$ sudo netstat -atp | grep ssh
tcp  0 0 0.0.0.0:ssh       0.0.0.0:*         LISTEN      1094/sshd: /usr/sbi 
tcp  0 0 hp-z820:60436     hp-z820:ssh       ESTABLISHED 16648/ssh           
tcp  0 0 hp-z820:ssh       hp-z820:60436     ESTABLISHED 16649/sshd: joe [pr 
tcp6 0 0 [::]:ssh          [::]:*            LISTEN      1094/sshd: /usr/sbi 

Revision History

Created 2024/05

Comments

Popular posts from this blog

Understanding your WSL2 RAM and swap - Changing the default 50%-25%

Installing the RNDIS driver on Windows 11 to use USB Raspberry Pi as network attached

DNS for Azure Point to Site (P2S) VPN - getting the internal IPs