Posts

Showing posts with the label DASK

Python comparison - JIT, CUDA and DASK

Image
Python has become the language of choice for data scientists and data analysts. It is easy to use with a lot of analytical support libraries. Python programs aren't particularly fast. This has driven people to create a set of tools that help Python programs scale-up and scale-out.  We can compare approaches with a simple program that I adapted to JIT, GPU, and distributed computing. Demonstration Caveat Distributed computing works best when there are problems with lots of I/O that can be spread across workers.  This program has no I/O and a high data transfer to computation ratio. GPUs have a high cost for data load and unload data with GPU memory. This means they work best when there is a significantly higher computation-to-data transfer ratio than this program has. Demonstration The sample Python program creates 10,000,00 3 variable rows that it then uses as inputs for 10,000,000 iterations of log(x)*log(y)*log(Z).  The results are returned in a 10,000,000 ...

DASK so cool - Faster data analytics with Python and DASK - Run locally with Docker

Image
Data scientists and others want to analyze and manipulate lots of data without having to be Software developers. Jupyter notebooks, Python and no DASK are vehicles that help make that happen. DASK is a distributed Python library that can run in pretty much any Python environment.  It was designed to help data scientists scale up their data analysis by providing an easy to use distributed compute API and paradigm. DASK/Python can be run both inside Jupyter notebooks and as standalone Python programs. Environment The DASK environment consists of a Dask Scheduler and any number of worker nodes. The Python program sends a set of tasks to the Dask Scheduler which then distributes those across the worker nodes.  The worker node results are then aggregated  and returned to the original program. Docker The DASK team provides samples how to use docker-compose to create a DASK development and execution environment The environment consists of A development no...