Typical ML Workflow

This diagram shows a typical real-time machine learning (ML) workflow in Aizen.

ML Workflow Steps

The ML workflow consists of these steps:

Define data sources and connect your data sources to the Aizen platform. These data sources contain historical data that will be used to train your ML model. The data sources are typically database tables or CSV files that are external to the Aizen platform.

Configure data sinks in Aizen. A data sink is a table in Aizen storage. Each data sink connects to a data source.
Define constraints and metrics on the data sink. Constraints specify the data checks to perform when pulling data from a data source into a data sink. Metrics specify the data analytics to perform on data while it is being placed in the data sink.

Configure a dataset and define ML features for the dataset.
Create the training dataset. This action materializes the ML features that were defined for that dataset. It is achieved by scheduling a job and then waiting for the job to complete.
Explore training datasets. You may explore and analyze the training dataset to check metrics such as minimum, maximum, average, missing values or correlation of features and labels.

Configure a training experiment. Aizen is an auto-ML platform. At a minimum, you just specify the input feature names and the output (or label) feature names, and whether you want to use machine learning (ML) or deep learning (DL) algorithms.
Train an ML model. This action runs the training experiment that was configured. It is achieved by scheduling a job and then waiting for the job to complete.
View training results. You may view the trained model results and metrics using tools such as TensorBoard and Mlflow.
If the training results are optimal, proceed to Step 5. Otherwise, repeat steps 1 to 4.

You can skip this step if you do not have real-time data.

Configure real-time data sources to connect to data sinks. The real-time data sources, such as a Kafka stream, must provide real-time data that corresponds to the historical data sources that you used in Step 2. You may skip this step if you do not have real-time data.
Add the real-time data source to the data sink. This action runs a job that periodically and continuously fetches data from the real-time data source and stores it in the data sink. Any ML features that were defined from the data sink can now be materialized with real-time data.

Register a trained ML model. Models must be registered for deployment.
Configure a prediction deployment. This defines the name of the deployment, along with the registered ML model name and version. The configured prediction is deployed by scheduling a job. The job status will provide the URL that external prediction applications can use to make prediction requests.
Explore the prediction log. You may explore the prediction log for data drift analysis and prediction accuracy.

Last updated 7 months ago