Learning about ML training with the NVidia Workbench Example Kaggle Competition Kernel

Kaggle runs different machine learning or data science competitions. You can participate using their containerized environments or by coding locally. NVidia simplified working locally, or in your own cloud, with an AI Workbench-compatible example Kaggle competition kernel. Their project contains everything needed to download competition data from Kaggle, run train/test cycles, and then upload the results for evaluation. I love this dockerized project because it lets me play in a competition sandbox on my local machine with no local configuration changes to my development machine

The Handwritten Digits Recognizer competition is an open-ended competition trainer. Kaggle provides images of handwritten digits. You train against the training dataset and test against the testing data set. Then run your trained model against the candidate digits of the competition submission set and upload the results to the Kaggle competition.

The NVidia Workbench example Kaggle competition kernel is designed around the Kaggle competition workflow and is built on top of Kaggle-created GPU-enabled docker images. It contains an input data Jupyter Notebook, a Processing Notebook, and a Competition Submission Notebook. You configure the contest name in the first one to download the data and then create, train, and test your model in the 2nd Jupyter Notebook.  Lastly, the 3rd Notebook is used to package and upload the results of the competition dataset analysis. The project is set up to let you change competitions and change the processing used to train the model.

Parameter Tuning

Model training is done by tuning the model parameters while consuming the testing data. Training is computationally intensive requiring another run for each training parameter change. This.is the biggest computational task, one that can be well-suited to GPU parallelization. This diagram shows the accuracy adjustments in the training and test data set as the parameters are adjusted. We will look at the training computational cost below.

Digits Competition 

The source images are sized 28x28.  A sample set of digits at the model resolution to the right.

I erroneously said in the video that.

The sample project scales down the images to speed up the training and analysis process.

That was incorrect

Micro benchmarking GPU vs GPU vs CPU

NVidia Workbench example Kaggle competition kernel supports both GPU-based and CPU-based training and execution.  It will automatically use any GPUs available to the containers.  The container falls back to CPU execution when no GPU is available. The A600 timing is in the project documentation.  The rest of the timing was generated on my machines. Processing on a GPU is significantly faster than CPU processing with a CPU.  The time difference grows proportional to the size of the data sets and the resolution of the images. The test data only contains 30 images. There are 20 training iterations. 

ProcessorRTX A6000Titan RTXRTX 3060 TIRyzen 5900XXeon E5-2680 V2
Wall time20 sec40 sec65 sec386 sec419 sec
      
Cuda Cores1075246084864  
Tensor Cores336576152  
Tensor Gen323  
SM847238  
Tensor CoresSM * 4SM * 8SM * 4  
FP16 (TFLOP)38.732.616.2  
FP32 (TFLOP) 16.316.2  
FP64 (GFLOP) 509253  
Cores (c/t)   12/2420/40
CPU   3.7 Ghz2.8 Ghz
CPU Time (HH:MM)   1:36:002:55:00

Training consumed about 600MB of VRAM. These timings demonstrate how computationally expensive training can be. The Xeon machine run consumed 65% of all 40 Xeon cores for 7 minutes of wall time resulting in 3 hours of aggregated CPU time. 

Train and Test used above

This one line of code ran the train and test cycle.

history = model.fit(X_train, Y_train,
epochs=20, batch_size=64,
validation_data=(X_dev, Y_dev)
)

The training run output can be seen here.  You may need to click on it.


Switching between CPU and GPU in the Kaggle Container

I brute force to make the GPU available or unavailable. Setting "0" GPUs forces CPU execution.  Any other value makes a GPU available for execution. This requires a restart for the container to pick up the change.


Kaggle Configuration Notes 

The NVidia Workbench example Kaggle competition kernel requires some minor configuration before usage. I'm adding it here for convenience.

Competition Name.

The competition name is sourced from a variable. It defaults to the Kaggle Handwritten Digits Recognizer. You can change the variable to point at a different competition.

Kaggle Secrets

Kaggle competitions require registration and the generated keys.  You can set the Kaggle credentials in the AI Workbench Environment tab.



Input and Output File Locations

The project lets you store the downloaded input files and the generated output to a location outside of the containerized environment through the use of Host Mounts. The screenshot below configures a Windows/WSL machine to store the files on the Windows D drive.



Discussing performance



Making the input and output directories

Putting this here is more for me than other folks so I don't forget what I did.

Windows WSL

We want the competition input and output files hosted locally on a Windows drive where they will persist even if we destroy the competition container and then rebuild it.  In this case, I wanted them on D:/kaggle/input.  The Windows file systems are mounted to the container under /mnt/ so we end up with /mnt/d/kaggle/input and /mnt/d/kaggle/output.

I hopped into the workbench WSL environment to create the directories with bash. It could just as well have been done in Windows with PowerShell.

workbench@Powerspec-g708:~$ ls /mnt
c  d  e  wsl  wslg  x
workbench@Powerspec-g708:~$ mkdir -p /mnt/d/kaggle/input
workbench@Powerspec-g708:~$ mkdir -p /mnt/d/kaggle/output

Linux

We do the same thing for a Linux environment.  I ran this on a remote Linux machine where I ran a remote version of the AI Workbench.  In this case, I put the two Kaggle directories in /tmp

(base) joefreeman@hp-z820 ~ % mkdir -p /tmp/kaggle/input
(base) joefreeman@hp-z820 ~ % mkdir -p /tmp/kaggle/output

References

Revision History

Created 2024/11

Comments

Popular posts from this blog

Installing the RNDIS driver on Windows 11 to use USB Raspberry Pi as network attached

Understanding your WSL2 RAM and swap - Changing the default 50%-25%

Almost PaaS Document Parsing with Tika and AWS Elastic Beanstalk