entitopia - a Python tool that for loading, customizing and automating indexes and data loads into Elasticsearch

April 15, 2023

ElasticSearch is an awesome extensible text search engine. It provides methods for loading data, customizing the data, applying analyzers, changing search weightings, and enriching data by merging subsets of multiple datasets. We can merge pieces of different datasets (indexes) into customized indexes to meet our data analysis needs.

We want to do all of that in a repeatable and automatable fashion with some level of flexibility. The Python code lets us define pipelines that support multiple steps and customized operations.

This diagram shows a 3-step pipeline that represents data being loaded into two indexes (1,3) with an enrichment and resource manipulation step (2).

Each step is driven from a config file that describes the phase processors and other configuration information.

{
    "steps": [
        {
            "name": "doctors-clinicians",
            "phases": [
                "index-create",
                "index-map",
                "index-populate"
            ]
        }
    ],
    "all_phases": [
        "index-create",
        "index-map",
        "enrichment-policies",
        "pipelines",
        "index-populate"
    ],
    "configurationDir": "configuration",
    "dataDir": "data",
    "logLevel": "INFO"
}

Each phase has a configuration file that drives the associated phase processor

{
    "alias": "doctors-clinicians-000001",
    "index": "doctors-clinicians-{now/d}-000001",
    "source": "DAC_NationalDownloadableFile.csv",
    "id_field": "NPI",
    "num_rows": 50000,
    "skip_rows": 0
}

Video

Watch an overview in this video

Blog de Joe Freeman

entitopia - a Python tool that for loading, customizing and automating indexes and data loads into Elasticsearch

Video

Resources

Comments

Post a Comment

Popular posts from this blog

Installing the RNDIS driver on Windows 11 to use USB Raspberry Pi as network attached

Understanding your WSL2 RAM and swap - Changing the default 50%-25%

Get AI code assist VSCode with local LLMs using LM Studio and the Continue.dev extension - Windows