Creating Training Datasets

A training dataset is required to train an ML model. To create a training dataset, follow these steps:

  1. Log in to the Aizen Jupyter console. See Using the Aizen Jupyter Console.

  2. Create an ML project if you have not already done so or set the current working project.

    create project <project name>

    or

    set project <project name>
  3. Configure the dataset by running the configure dataset command:

    configure dataset
  4. In the notebook, you will be guided through a template form with boxes and drop-down lists that you can complete to create features for the dataset. For each feature, the template will ask you to define it as one of the four types of features, starting with basis features. For each of the four types of features, you will need to provide the various parameters for that feature, such as join-key columns, aggregate function names, or expression functions.

  5. Create the training dataset using the start dataset command to schedule a job. Optionally, you may configure resources for the job by running the configure resource command. If you do not configure resources, default resource settings will be applied.

    configure resource
    start dataset <dataset name>
  6. Wait for the job to complete, and then check your training dataset:

    status dataset <dataset name>
    list datasets
    display dataset <dataset name>
  7. Explore your dataset using the data analysis commands. Run the loader command to load data for visualization. Data may be loaded from a data source, a data sink or a dataset that has been created. After loading the data, run the plot or show commands to visualize the data.

    loader
    show stats
    show datatypes
    show unique
    plot
    run analysis

Last updated