DASK so cool - Faster data analytics with Python and DASK - Run locally with Docker
Data scientists and others want to analyze and manipulate lots of data without having to be Software developers. Jupyter notebooks, Python and no DASK are vehicles that help make that happen. DASK is a distributed Python library that can run in pretty much any Python environment. It was designed to help data scientists scale up their data analysis by providing an easy to use distributed compute API and paradigm. DASK/Python can be run both inside Jupyter notebooks and as standalone Python programs. Environment The DASK environment consists of a Dask Scheduler and any number of worker nodes. The Python program sends a set of tasks to the Dask Scheduler which then distributes those across the worker nodes. The worker node results are then aggregated and returned to the original program. Docker The DASK team provides samples how to use docker-compose to create a DASK development and execution environment The environment consists of A development node with Python and Jupyt