# Serving an LLM

You can deploy a pretrained LLM or a fine-tuned LLM on Aizen. The served LLM will handle text generation or chat completion requests. You can also deploy embeddings models on Aizen.

To serve an LLM or an embeddings model, follow these steps:

1. Log in to the Aizen Jupyter console. See [Using the Aizen Jupyter Console](/docs/getting-started/using-the-aizen-jupyter-console.md).
2. Set the current working project.

   ```
   set project <project name>
   ```
3. If the LLM was fine-tuned on Aizen, register the fine-tuned LLM that you want to deploy:

   ```
   list trained-models <ML model name>
   register model <ML model name>,<run id>,PRODUCTION 
   list registered-models
   ```

   This step is not required if the LLM is a pretrained model from the Hugging Face Hub.
4. Configure an LLM deployment using the `configure llm` command.

   ```
   configure llm
   ```
5. In the notebook, you will be guided through a template form with boxes and drop-down lists that you can complete to configure the deployment. You must specify the LLM deployment name and set the type as either **llm** or **embeddings**.
   * If the LLM was fine-tuned on Aizen, set the source type to **aizen**, and specify the registered model name and version.
   * If the LLM is a pretrained model from the Hugging Face Hub, set the source type to **huggingface**, and specify the model name.
6. Serve the model using the `start llm` command. This command will schedule a job to deploy the model. Optionally, you can configure resources for the job by running the `configure resource` command. If you do not configure resources, default resource settings will be applied. If you require GPUs, you must use the `configure resource` command, as GPU resources are not included in the default resource settings.

   ```
   configure resource
   start llm <llm deployment name>
   ```
7. Check the status of the LLM deployment job and obtain serving URLs by running this command:

   ```
   status llm <llm deployment name>
   ```
8. The base URL in the status output supports a REST API that lists LLMs and embeddings models that are currently being served. The endpoint URL in the status output supports text generation or chat completion requests. Both of these URLs provide the FastAPI docs, Redoc, and OpenAPI paths.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://aizen-corp.gitbook.io/docs/managing-llm-workflows/serving-an-llm.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
