Software Development in a Container - Mounting code into the container - A Primer

Containers make it easy to set up a complex data scientist development environment.  A developer can just spin up a Python, Jupyter Notebook, Spark, Hadoop, or another type of container on a local machine in minutes.

Containers can be confusing when you first work with them. Here we talk a little about how you can get code and data into your container environment and how you can get it back out.

I want to write code local to my laptop and run the code inside a fully configured Anaconda container. And, I'm lazy.

Two ways to get code onto a container for development

Containers are standalone mini machines with private disk space, CPU, networking  and other services.  They are not intended to retain state, something that we definitely want to do in a development environment. We need to get our code inside the container. We can do the same thing with data or we can have our code pull the data in at runtime.

There are two primary ways of getting code onto a machine.  We can copy our code into the machine, work on it, and then push the updated code back to the original location.  Alternatively, we can mount a remote file system containing the code into the container where it can be seen and run.

Option 1: Copy Code to Container

Starting work
  1. Log into the container: git:clone, scp, sftp, etc.
  2. Copy the code from "My Machine" the master machine.
Saving work external to the container
  1. Push the code back to the master machine: git push, scp, sftp, etc.

Option 2: Mount Code into the Container

Starting Work
  1. Start the container with the -v volume mount property. This will mount the host machine's directory on the _contaier's_ file system. The mounted folder looks like any other directory.
Do Work
  1. All changes are on the mount so all changes must be pushed back to the parent or other machine.

Video: This article

Related Content


Using local Mounts

Let us walk through an external mount example

Local machine with a single drive
Our development machine has one hard drive.  We have some work in `/home/[myuser]/Documents/MyProject.  That folder is visible on our development machine.

Docker Containers have their own drives

Docker containers have their own private drive space. Each docker container has its own filesystem. Running docker containers do not share disk space with their host or other containers.  We will work around that.

Sharing a drive from the host into  a container
Docker has a -v command-line option that tells it to mount an external hard drive to a docker container. The -v option specifies the path to the source directory on the development machine and the path to its mount location inside the docker container.  Everything that is in that source directory will appear on the container location.

Editing content on the host
The same files appear on the host and inside the docker container.  The host is the true home.  You can run an IDE and modify and develop code using host-based tools. All host made changes are immediately visible in the container.

I tend to edit on the host with Visual Code and then run the code inside the container.

Editing content on the Container with container-based IDE
Some development environments like Jupyter Notebooks run as a web server. You can start the notebook server in the container on container startup.  The IDE runs on the server so it can see files in that remote server's workspace.

We mount the host directory in some locations that can be seen by the Jupyter server. Data Scientists can use the Jupyter notebook knowing that their work is actually saved on the host machine and will stay available after the container is stopped.

Thanks for reading.
Created 2/2021


Popular posts from this blog

Understanding your WSL2 RAM and swap - Changing the default 50%-25%

Installing the RNDIS driver on Windows 11 to use USB Raspberry Pi as network attached

DNS for Azure Point to Site (P2S) VPN - getting the internal IPs