Cloud and Software Architecture, Soft skills, IOT and embedded
Visualizing the Donors Choose data set with Kibana and Elasticsearch
Get link
Facebook
X
Pinterest
Email
Other Apps
The Elasticsearch example codebase includes a Donors Choose public data set.
The example uses a set of Kibana visualizations. The following image shows a
subset of the visualizations used in the dashboard.
Donors Choose Kibana Dashboard
The map visualization uses provided geopoint , Lat and
Long, data.
You can see there are
6.2 million donations in the data set.
2 million donations.
$500 million in donated
Contains data from 2003 through 2018
Video Talk
This talk mostly describes how to get the data set and index it in
Elasticsearch and then visualize with the provided dashboard.
Importing the Dashboard
This assumes that you have already indexed the data using the scripts in the
GitHub repository. See the related blog pieces for more information.
Connect to the Kibana dashboard. If you ran Elasticsearch / Kibana locally then the URL is probably:
http://localhost:5601
Verify the index exists. You can explore the fields by clicking on donorschoose
Visualizations bind to index patterns. Create an index pattern using the new index.
First specify the pattern value you will use to bind to the index
Specify the field to be used for the time series. All Elasticsearch data is time series data.
You should be presented with the fields that make up the donorschoose index pattern fields
The index pattern donorschoose should now appear in the Saved Objects list
Now we are going to load the saved dashboard configuration. Load donorschoose_dashboard.ndjson from the git repository
The Index Pattern Identifier will have changed so we need to tell the loaded items that they are bound to the index pattern we created above. Make sure you do this for any/all that are pointed out in the right side panel. Select donorschoose for any index pattern binding requests.
The dashboard consists of a Dashboard parent and 30+ visualizations. Find and click on the Dashboard parent to launch a connection to the dashboard.
Tha's all folks. You should now see a visualization similar to the one at the top of the blog. Set the date range to start in 2003.
I do a lot of my development and configuration via ssh into my Raspberry Pi Zero over the RNDIS connection. Some models of the Raspberry PIs can be configured with gadget drivers that let the Raspberry pi emulate different devices when plugged into computers via USB. My favorite gadget is the network profile that makes a Raspberry Pi look like an RNDIS-attached network device. All types of network services travel over an RNDIS device without knowing it is a USB hardware connection. A Raspberry Pi shows up as a Remote NDIS (RNDIS) device when you plug the Pi into a PC or Mac via a USB cable. The gadget in the Windows Device Manager picture shows this RNDIS Gadget connectivity between a Windows machine and a Raspberry Pi. The Problem Windows 11 and Windows 10 no longer auto-installs the RNDIS driver that makes magic happen. Windows recognizes that the Raspberry Pi is some type of generic USB COM device. Manually running W indows Update or Update Driver does not install the RNDI
The Windows Subsystem for Linux operates as a virtual machine that can dynamically grow the amount of RAM to a maximum set at startup time. Microsoft sets a default maximum RAM available to 50% of the physical memory and a swap-space that is 1/4 of the maximum WSL RAM. You can scale those numbers up or down to allocate more or less RAM to the Linux instance. The first drawing shows the default WSL memory and swap space sizing. The images below show a developer machine that is running a dev environment in WSL2 and Docker Desktop. Docker Desktop has two of its own WSL modules that need to be accounted for. You can see that the memory would actually be oversubscribed, 3 x 50% if every VM used its maximum memory. The actual amount of memory used is significantly smaller allowing every piece to fit. Click to Enlarge The second drawing shows the memory allocation on my 64GB laptop. WSL Linux defaults to a maximum RAM size of 5
The Apache Tika project provides a library capable of parsing and extracting data and meta data from over 1000 file types. Tika is available as a single jar file that can be included inside applications or as a deployable jar file that runs Tika as a standalone service. This blog describes deploying the Tika jar as an auto-scale service in Amazon AWS Elastic Beanstalk. I selected Elastic Beanstalk because it supports jar based deployments without any real Infrastructure configuration. Elastic Beanstalk auto-scale should take care of scaling up and down for for the number of requests you get. Tika parses documents and extracts their text completely in memory. Tika was deployed for this blog using EC2 t2.micro instances available in the AWS free tier. t2.micro VMs are 1GB which means that you are restricted in document complexity and size. You would size your instances appropriately for your largest documents. Preconditions An AWS account. AWS access id and secret key.
Comments
Post a Comment