# Creating Training Datasets

A training dataset is required to train an ML model. To create a training dataset, follow these steps:

1. Log in to the Aizen Jupyter console. See [Using the Aizen Jupyter Console](/docs/getting-started/using-the-aizen-jupyter-console.md).
2. Create an ML project if you have not already done so or set the current working project.

   <pre data-overflow="wrap"><code>create project &#x3C;project name>
   </code></pre>

   or

   <pre data-overflow="wrap"><code>set project &#x3C;project name>
   </code></pre>
3. Configure the dataset by running the `configure dataset` command:

   <pre data-overflow="wrap"><code>configure dataset
   </code></pre>
4. In the notebook, you will be guided through a template form with boxes and drop-down lists that you can complete to create features for the dataset. For each feature, the template will ask you to define it as one of the four types of features, starting with basis features. For each of the four types of features, you will need to provide the various parameters for that feature, such as join-key columns, aggregate function names, or expression functions.
5. Create the training dataset using the `start dataset` command to schedule a job. Optionally, you may configure resources for the job by running the `configure resource` command. If you do not configure resources, default resource settings will be applied.

   <pre data-overflow="wrap"><code>configure resource
   start dataset &#x3C;dataset name>
   </code></pre>
6. Wait for the job to complete, and then check your training dataset:

   <pre data-overflow="wrap"><code>status dataset &#x3C;dataset name>
   list datasets
   display dataset &#x3C;dataset name>
   </code></pre>
7. Explore your dataset using the data analysis commands. Run the `loader` command to load data for visualization. Data may be loaded from a data source, a data sink or a dataset that has been created. After loading the data, run the `plot` or `show` commands to visualize the data.

   <pre data-overflow="wrap"><code>loader
   show stats
   show datatypes
   show unique
   plot
   run analysis
   </code></pre>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://aizen-corp.gitbook.io/docs/managing-ml-workflows/creating-training-datasets.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
