# Creating Training Datasets for LLMs

A training dataset is required for you to fine-tune an LLM. To create a training dataset, follow these steps:

1. Log in to the Aizen Jupyter console. See [Using the Aizen Jupyter Console](/docs/getting-started/using-the-aizen-jupyter-console.md).
2. Create an ML project if you have not already done so or set the current working project.

   ```
   create project <project name>
   ```

   or

   ```
   set project <project name>
   ```
3. Configure the dataset by running the `configure dataset` command:

   ```
   configure dataset
   ```
4. In the notebook, you will be guided through a template form with boxes and drop-down lists that you can complete to create features for the dataset.
   * If the input to the LLM is a single column in the dataset, then that column can contain the entire input text, including the prompt, or you can configure a prompt template separately during fine-tuning.
   * If the input to the LLM is two or more columns from the dataset, then you must configure a prompt template separately during fine-tuning.
5. Create the training dataset using the `start dataset` command to schedule a job. Optionally, you can configure resources for the job by running the `configure resource` command. If you do not configure resources, default resource settings will be applied.

   ```
   configure resource
   start dataset <dataset name>
   ```
6. Wait for the job to complete, and then check your training dataset:&#x20;

   ```
   status dataset <dataset name>
   list datasets
   display dataset <dataset name>
   ```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://aizen-corp.gitbook.io/docs/managing-llm-workflows/creating-training-datasets-for-llms.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
