Differences between ML Build and Train vs Production
Machine Learning Build and Train and Production Execution often use different controls, management, and run time platforms and languages. Model invocation and feature conversion techniques are different during the exploration, training, and production execution phases.
The Machine Learning process is often referred to as
Build and Train. This is where data scientists and data analysts attempt to understand the true inputs to their decision-making process. They manually manipulate data into forms that can be fed to Machine Models for training. Those models are then analyzed for predictive behavior and the whole process repeats until a target model is created.
Production inputs (Features) must be transformed into the same form they were during Build and Train. This means that the production system needs to run the same data transformations done during Build in Train.
Production feature generation and model invocation are more rigorously created and executed than they are in the Build and Train environment.
Similar operations often use different tooling that is better adapted to controls and governance of that phase.
Governance
Transforms and models are often governance artifacts. They need to be saved
and reviewed and bound to lineage tracking if used within regulated
environments.
Video
Video version of this blog. It is included here in the middle of the
flow to make it easier to find :-)
The Same but Not The Same
The two pipelines look the same but they have very different controls and
tooling.
- The exploratory environment may use Jypter Notebooks, direct SQL and a user-accessible compute environment. Model training may execute dozens or thousands of operations against the same Features. Models are tested and results gathered.
- The production environment has code quality standards, specific transformation platforms, and deployable-only computing environments. Model execution executes once per each set of inputs.
Batch or Activity/Event Driven
Models are initially trained in batch mode. Model execution happens as part of Model Testing. That is also generally also done in batch mode as a
post-training step. Features are calculated en-masse and then applied to the model as part of a training cycle.
Production models can be executed in batch, near-time, or real-time. This means they may be deployed as
batch tasks or API endpoints or other methods.
Production features may be generated via batch or as needed. Features used only in batch model execution are often created via
batch. Features used in API model invocation are often a mix of batch
and real-time generated. Reference, account, demographic, and other
slow-changing data may be converted to Features via batch.
Clickstream, user action, event-driven Features may be created as part of
the model invocation.
Build and Train vs Traditional SDLC
Build and Train and Production execution with retraining are the same
but different. Their forcing functions are different.
- Data Exploration needs freewheeling access today and the ability to store intermediate results and work on them later. They need flexible secure compute that can change without extensive planning.
- Software engineering is about control and repeatability. It runs in constrained environments where compliance can be as important as results.
Exploratory work often involves Notebook technology and many iterative passes
at the data. Production automation need transforms in their format and
needs to know all the transforms required from beginning to in order to
re-create the feature without human intervention.
Training Data - Production and Incidental PII
Models can only be trained with production or very production-like. Data. In most companies or organizations this means the data scientists and the build system must have access to production
data which impacts data governance and access controls.
Incremental and automated model retraining systems must also have access to
current production data. This means automated ML training run within the
production data scope.
PII is not generally required in training data. There are instances though
where PII
may accidentally or incidentally be
present. Customer call recordings are a good example where arbitrary PII
may exist in the call logs. This means Machine Model Training must often
operate within the most sensitive data zones.
Production or Off-line Model Retraining
Models have to be retrained as data changes. This can be done manually or via
automation. There is a certain amount of complexity and governance in
this.
- Should this be data or time-triggered?
- If data triggered then how does that call back into non-production Build/Train or CI/CD
- etc..
Comments
Post a Comment