--> Skip to main content

Posts

Featured

Python multiprocessing.Pool improvement examples in Donor's Choice data

We're going to walk through a couple places where simple Python parallelization created big performance improvements using the Elasticsearch Donor's Choice prep scripts on GitHub .  This is a clone of the Elasticsearch GitHub repository of the same name.My rule of thumb for this particular processing section was that I would only move to parallel execution if execution time was reduced within 80% of the number of extra processors. So with 8 cores I wanted to get at least 6 times performance.Python Multiprocessing Parallel ExecutionPython is essentially single threaded in many situations.  Thread based parallelization is pretty much only useful for I/O bound situations like web requests where the threads are idle most of the time. Compute bound parallel execution is usually multi-processor with work being fed off to essentially different programs.   Each processing unit in multi-processing parallel execution runs in its own address space.  Data must be copied to the worker proce…

Latest Posts

Visualizing the Donors Choose data set with Kibana and Elasticsearch

Visualizing public data sets with Python and ElasticSearch

Differences between ML Build and Train vs Production

Kubernetes Dashboard with multi-node Kubernetes on a laptop.

Rethinking the Agile Standup in the age of Slack

Across Generations - why do interns laugh at 420 milestones?

Multi-Node Kubernetes with KIND and Docker Desktop