DASK so cool - Faster data analytics with Python and DASK - Run locally with Docker

Data scientists and others want to analyze and manipulate lots of data without having to be Software developers. Jupyter notebooks, Python and no DASK are vehicles that help make that happen.

DASK is a distributed Python library that can run in pretty much any Python environment.  It was designed to help data scientists scale up their data analysis by providing an easy to use distributed compute API and paradigm. DASK/Python can be run both inside Jupyter notebooks and as standalone Python programs.

Environment

The DASK environment consists of a Dask Scheduler and any number of worker nodes. The Python program sends a set of tasks to the Dask Scheduler which then distributes those across the worker nodes.  The worker node results are then aggregated  and returned to the original program.

Docker

I have created a docker-compose.yml file  that mounts persistent storage on the development node. This makes it easy to keep work across container deployments.  My docker-compose.yml file is derived from the DASK team's examples which do not have persistent storage, at this time.


Local DASK docker cluster deployed in Docker
The DASK team provides samples how to use docker-compose to create a DASK development and execution environment The environment consists of

  • A development node with Python and Jupyter Notebook server
  • A DASK scheduler
  • DASK worker nodes

DASK in Action

DASK with Jupyter notebooks running locally on Docker


DASK with VSCode running locally on Docker


Comments