LogoLogo
Have questions?📞 Speak with a specialist.📅 Book a demo now.
  • Welcome
  • INTRODUCTION
    • What Is Aizen?
    • Aizen Platform Interfaces
    • Typical ML Workflow
    • Datasets and Features
    • Resources and GPUs
    • LLM Operations
    • Glossary
  • INSTALLATION
    • Setting Up Your Environment
      • Hardware Requirements
      • Deploying Kubernetes On Prem
      • Deploying Kubernetes on AWS
      • Deploying Kubernetes on GCP
        • GCP and S3 API Interoperability
        • Provisioning the Cloud Service Mesh
        • Installing Ingress Gateways with Istio
      • Deploying Kubernetes on Azure
        • Setting Up Azure Blob Storage
    • Installing Aizen
      • Software Requirements
      • Installing the Infrastructure Components
      • Installing the Core Components
      • Virtual Services and Gateways Command Script (GCP)
      • Helpful Deployment Commands
    • Installing Aizen Remote Components
      • Static Remote Deployment
      • Dynamic Remote Deployment
    • Installing Optional Components
      • MinIO
      • OpenLDAP
      • OpenEBS Operator
      • NGINX Ingress Controller
      • Airbyte
  • GETTING STARTED
    • Managing Users and Roles
      • Aizen Security
      • Adding Users
      • Updating Users
      • Listing Users and Roles
      • Granting or Revoking Roles
      • Deleting Users
    • Accessing the Aizen Platform
    • Using the Aizen Jupyter Console
  • MANAGING ML WORKFLOWS
    • ML Workflow
    • Configuring Data Sources
    • Configuring Data Sinks
    • Creating Training Datasets
    • Performing ML Data Analysis
    • Training an ML Model
    • Adding Real-Time Data Sources
    • Serving an ML Model
    • Training and Serving Custom ML Models
  • MANAGING LLM WORKFLOWS
    • LLM Workflow
    • Configuring Data Sources
    • Creating Training Datasets for LLMs
    • Fine-Tuning an LLM
    • Serving an LLM
    • Adding Cloud Providers
    • Configuring Vector Stores
    • Running AI Agents
  • Notebook Commands Reference
    • Notebook Commands
  • SYSTEM CONFIGURATION COMMANDS
    • License Commands
      • check license
      • install license
    • Authorization Commands
      • add users
      • alter users
      • list users
      • grant role
      • list roles
      • revoke role
      • delete users
    • Cloud Provider Commands
      • add cloudprovider
      • list cloudproviders
      • list filesystems
      • list instancetypes
      • status instance
      • list instance
      • list instances
      • delete cloudprovider
    • Project Commands
      • create project
      • alter project
      • exportconfig project
      • importconfig project
      • list projects
      • show project
      • set project
      • listconfig all
      • status all
      • stop all
      • delete project
      • shutdown aizen
    • File Commands
      • install credentials
      • list credentials
      • delete credentials
      • install preprocessor
  • MODEL BUILDING COMMANDS
    • Data Source Commands
      • configure datasource
      • describe datasource
      • listconfig datasources
      • delete datasource
    • Data Sink Commands
      • configure datasink
      • describe datasink
      • listconfig datasinks
      • alter datasink
      • start datasink
      • status datasink
      • stop datasink
      • list datasinks
      • display datasink
      • delete datasink
    • Dataset Commands
      • configure dataset
      • describe dataset
      • listconfig datasets
      • exportconfig dataset
      • importconfig dataset
      • start dataset
      • status dataset
      • stop dataset
      • list datasets
      • display dataset
      • export dataset
      • import dataset
      • delete dataset
    • Data Analysis Commands
      • loader
      • show stats
      • show datatypes
      • show data
      • show unique
      • count rows
      • count missingvalues
      • plot
      • run analysis
      • run pca
      • filter dataframe
      • list dataframes
      • set dataframe
      • save dataframe
    • Training Commands
      • configure training
      • describe training
      • listconfig trainings
      • start training
      • status training
      • list trainings
      • list tensorboard
      • start tensorboard
      • stop tensorboard
      • stop training
      • restart training
      • delete training
      • list mlflow
      • save embedding
      • list trained-models
      • list trained-model
      • export trained-model
      • import trained-model
      • delete trained-model
      • register model
      • update model
      • list registered-models
      • list registered-model
  • MODEL SERVING COMMANDS
    • Resource Commands
      • configure resource
      • describe resource
      • listconfig resources
      • alter resource
      • delete resource
    • Prediction Commands
      • configure prediction
      • describe prediction
      • listconfig predictions
      • start prediction
      • status prediction
      • test prediction
      • list predictions
      • stop prediction
      • list prediction-logs
      • display prediction-log
      • delete prediction
    • Data Report Commands
      • configure datareport
      • describe datareport
      • listconfig datareports
      • start datareport
      • list data-quality
      • list data-drift
      • list target-drift
      • status data-quality
      • display data-quality
      • status data-drift
      • display data-drift
      • status target-drift
      • display target-drift
      • delete datareport
    • Runtime Commands
      • configure runtime
      • describe runtime
      • listconfig runtimes
      • start runtime
      • status runtime
      • stop runtime
      • delete runtime
  • LLM AND EMBEDDINGS COMMANDS
    • LLM Commands
      • configure llm
      • listconfig llms
      • describe llm
      • start llm
      • status llm
      • stop llm
      • delete llm
    • Vector Store Commands
      • configure vectorstore
      • describe vectorstore
      • listconfig vectorstores
      • start vectorstore
      • status vectorstore
      • stop vectorstore
      • delete vectorstore
    • LLM Application Commands
      • configure llmapp
      • describe llmapp
      • listconfig llmapps
      • start llmapp
      • status llmapp
      • stop llmapp
      • delete llmapp
  • TROUBLESHOOTING
    • Installation Issues
Powered by GitBook

© 2025 Aizen Corporation

On this page
  • Default Infrastructure Configuration Settings
  • Checking the Deployment Status of the Infrastructure Components
  1. INSTALLATION
  2. Installing Aizen

Installing the Infrastructure Components

PreviousSoftware RequirementsNextInstalling the Core Components

Last updated 3 months ago

Before proceeding with these steps, make sure that you have the required software installed. See the Software Requirements.

Aizen uses these open-source components as dependencies:

Open-Source Component
Description

Apache Spark

Enables you to specify and run Spark applications on Kubernetes just as you would other workloads on Kubernetes. The Spark operator uses Kubernetes custom resources for specifying, running, and surfacing the status of Spark applications.

Kuberay

Simplifies the deployment and management of Ray applications on Kubernetes.

Apache Kafka

A distributed event store and stream processing platform.

Filecopy server

Provides the ability to distribute custom preprocessor files to all nodes in the cluster.

MLflow

Manages a machine learning (ML) lifecycle, including experiments, deployment, and a central model registry. The MLflow tracking lets you log and query experiments using Python, Java, or REST APIs. MLflow runs are recorded in the mysql database.

Multi-Cluster Application Dispatcher (MCAD)

Provides an abstraction for wrapping all resources of jobs. This controller queues the jobs, applies queueing policies, and then dispatches the jobs when cluster resources are available.

Monitoring components:

Prometheus operator

An operator that configures and manages a Prometheus monitoring stack that runs in a Kubernetes cluster.

Elastic search

A distributed search and analytics engine designed for handling large volumes of data. It is used for storing, searching, and analyzing structured and unstructured data in real time.

Fluentd

A data collector that unifies data collection from various data sources and log files.

Update or add properties to the Aizen infrastructure Helm chart based on your environment and preferred configuration.

To install the Aizen infrastructure components, follow these steps:

  1. Create the namespace aizen-infra for the Aizen infrastructure components by running this command:

    kubectl create ns aizen-infra
  2. Using the Docker Hub credentials that you obtained for the Aizen microservice images (see ), create a Kubernetes Secret for accessing the Aizen images in the infrastructure namespace:

    kubectl create secret docker-registry aizenrepo-creds 
    --docker-username=aizencorp 
    --docker-password=<YOUR DOCKER CREDENTIALS> 
    -n aizen-infra
  3. Customize the settings (see Default Infrastructure Configuration Settings below) in the Aizen infrastructure Helm chart based on your environment and preferred configuration:

    • Update the STORAGE_CLASS, INGRESS_HOST, BUCKET_NAME, CLOUD_ENDPOINT_URL, CLOUD_ACCESSKEY_ID, CLOUD_SECRET_KEY, IMAGE_REPO_SECRET, MLFLOW_ACCESSKEY_ID, MLFLOW_SECRET_KEY, and MLFLOW_ENDPOINT_URL values.

    • For the MLFLOW_ENDPOINT_URL, use either an S3 bucket or a local MinIO bucket, but remember to create the bucket. If you are using local MinIO as an object store, make sure to set up MinIO first and then create buckets. See MinIO.

    • Change the default persistence size, which is hardcoded in the deployment of the various components, according to your environment.

    • For an Azure AKS cluster, add these properties to both infrastructure and core deployments: global.s3.azure.enabled=true

      global.s3.azure.values.storage_account_name
      global.s3.azure.values.storage_access_key
      global.s3.azure.values.storage_connection_string
      global.mlflow.artifact.secrets.values.mlflow_endpoint_url=https://<your storage account name>.blob.core.windows.net
      global.mlflow.artifact.secrets.values.mlflow_artifacts_destination=wasbs://<storage containername>@<storage account name>.blob.core.windows.net/<destination folder>
    • For a cloud-based deployment, add these properties to both infrastructure and core deployments:

      infra.hashicorp-vault.vault.server.hostAliases[0].ip="$CLOUD_ENDPOINT_IP",\
      infra.hashicorp-vault.vault.server.hostAliases[0].hostnames[0]="<specify the cloud endpoint url without http>"

  4. For a GCP cluster, check if istio-injection is enabled, and if enabled, disable istio-injection:

    #Check if the gateway namespace has istio-injection enabled
    kubectl get ns -L istio-injection
    
    #Disable istio-injection if enabled
    kubectl label ns aizen-infra istio-injection-
  5. Install the Aizen infrastructure components by running this command:

    helm -n $NAMESPACE install aizen-infra $HELMCHART_LOCATION/aizen
  6. For a GCP cluster, re-enable istio-injection using this command:

    kubectl label ns aizen-infra istio-injection=enable
  7. If you plan to use LDAP-based authentication, which is the default authentication type for the Aizen platform, you will need to install OpenLDAP and then create users. For instructions, see OpenLDAP.

Default Infrastructure Configuration Settings

NAMESPACE=aizen-infra
HELMCHART_LOCATION=aizen-helmcharts-1.0.0

STORAGE_CLASS=
INGRESS_HOST=
BUCKET_NAME=
CLUSTER_NAME=

CLOUD_ENDPOINT_URL=
CLOUD_ACCESSKEY_ID=
CLOUD_SECRET_KEY=
CLOUD_PROVIDER_REGION=

#Needed only for cloudian
CLOUD_ENDPOINT_IP=

LDAP_ENABLED=true
VECTORDB_ENABLED=false

#IMAGE
IMAGE_REPO=aizencorp
IMAGE_REPO_SECRET=
IMAGE_TAG=1.0.0

#MLFLOW
MLFLOW_ACCESSKEY_ID=
MLFLOW_SECRET_KEY=
MLFLOW_ENDPOINT_URL=
MLFLOW_ARTIFACT_DESTINATION=s3://
MLFLOW_ARTIFACT_REGION=

#PVC
KAFKA_PERSISTENCE_SIZE=25Gi
KAFKA_BROKER_PERSISTENCE_SIZE=25Gi
KAFKA_ZOOKEEPER_PERSISTENCE_SIZE=25Gi
MLFLOW_MYSQL_PERSISTENCE_SIZE=200Gi
PROMETHEUS_PERSISTENCE_SIZE=55Gi
GRAFANA_PERSISTENCE_SIZE=20Gi
ELASTIC_SEARCH_LOG_SIZE=55Gi
VECTORDB_PERSISTENCE_SIZE=25Gi
VAULT_PERSISTENCE_SIZE=25Gi
MILVUS_MINIO_PERSISTENCE_SIZE=100Gi
MILVUS_ETCD_PERSISTENCE_SIZE=50Gi
MILVUS_ETCD_REPLICAS=1

#You don't need to change anything below this line
helm -n $NAMESPACE install aizen-infra $HELMCHART_LOCATION/aizen \
--set infra.enabled=true,\
infra.kafka.kafka.global.storageClass=$STORAGE_CLASS,\
infra.kafka.kafka.zookeeper.persistence.size=$KAFKA_ZOOKEEPER_PERSISTENCE_SIZE,\
infra.kafka.kafka.broker.size=$KAFKA_BROKER_PERSISTENCE_SIZE,\
infra.kafka.kafka.persistence.size=$KAFKA_PERSISTENCE_SIZE,\
infra.prometheus-operator.kube-prometheus-stack.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=$STORAGE_CLASS,\
infra.prometheus-operator.kube-prometheus-stack.alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.storageClassName=$STORAGE_CLASS,\
infra.prometheus-operator.kube-prometheus-stack.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=$PROMETHEUS_PERSISTENCE_SIZE,\
infra.prometheus-operator.kube-prometheus-stack.grafana.persistence.size=$GRAFANA_PERSISTENCE_SIZE,\
infra.mlflow.mysql.primary.persistence.storageClass=$STORAGE_CLASS,\
infra.mlflow.mysql.primary.persistence.size=$MLFLOW_MYSQL_PERSISTENCE_SIZE,\
global.mlflow.artifact.region=$MLFLOW_ARTIFACT_REGION,\
global.mlflow.artifact.secrets.values.mlflow_access_key_id=$MLFLOW_ACCESSKEY_ID,\
global.mlflow.artifact.secrets.values.mlflow_access_secret_key=$MLFLOW_SECRET_KEY,\
global.mlflow.artifact.secrets.values.mlflow_endpoint_url=$MLFLOW_ENDPOINT_URL,\
global.mlflow.artifact.secrets.values.mlflow_artifacts_destination=$MLFLOW_ARTIFACT_DESTINATION,\
global.image_registry=$IMAGE_REPO,\
global.storage_class=$STORAGE_CLASS,\
global.image_secret=$IMAGE_REPO_SECRET,\
global.image_tag=$IMAGE_TAG,\
global.ingress.host=$INGRESS_HOST,\
global.log.volume_size=$ELASTIC_SEARCH_LOG_SIZE,\
global.clustername=$CLUSTER_NAME,\
global.s3.endpoint_url=$CLOUD_ENDPOINT_URL,\
global.s3.endpoint_ip=$CLOUD_ENDPOINT_IP,\
global.s3.secrets.values.s3_access_key=$CLOUD_ACCESSKEY_ID,\
global.s3.secrets.values.s3_secret_key=$CLOUD_SECRET_KEY,\
global.customer_bucket_name=$BUCKET_NAME,\
infra.openldap.enabled=$LDAP_ENABLED,\
infra.vectordb.enabled=$VECTORDB_ENABLED,\
infra.vectordb.vectordb.primary.persistence.size=$VECTORDB_PERSISTENCE_SIZE,\
infra.hashicorp-vault.vault.injector.enabled=false,\
infra.hashicorp-vault.vault.server.enabled=true,\
infra.hashicorp-vault.vault.server.standalone.enabled=true,\
infra.hashicorp-vault.vault.server.dataStorage.enabled=true,\
infra.hashicorp-vault.vault.server.dataStorage.storageClass=$STORAGE_CLASS,\
infra.hashicorp-vault.vault.server.dataStorage.size=$VAULT_PERSISTENCE_SIZE,\
infra.hashicorp-vault.vault.server.standalone.config='ui = true listener "tcp" {  address = "[::]:8200"  cluster_address = "[::]:8201"  tls_disable = 1} storage "s3" { bucket = "'"$BUCKET_NAME"'"  access_key = "'"$CLOUD_ACCESSKEY_ID"'"  secret_key = "'"$CLOUD_SECRET_KEY"'"  endpoint = "'"$CLOUD_ENDPOINT_URL"'"  region = "'"$CLOUD_PROVIDER_REGION"'" s3_force_path_style = true  path = "vault"}',\
infra.milvus.milvus.cluster.enabled=false,\
infra.milvus.milvus.standalone.enabled=true,\
nfra.milvus.milvus.pulsarv3.enabled=false,\
infra.milvus.milvus.pulsar.enabled=false,\
infra.milvus.milvus.etcd.replicaCount=$MILVUS_ETCD_REPLICAS,\
infra.milvus.milvus.minio.mode=standalone,\
infra.milvus.milvus.minio.persistence.size=$MILVUS_MINIO_PERSISTENCE_SIZE,\
infra.milvus.milvus.minio.address=$NAMESPACE-milvus-minio.$NAMESPACE.svc.cluster.local,\
infra.milvus.milvus.standalone.persistence.persistentVolumeClaim.storageClass=$STORAGE_CLASS,\
infra.milvus.milvus.etcd.global.storageClass=$STORAGE_CLASS,\
infra.milvus.milvus.etcd.persistence.size=$MILVUS_ETCD_PERSISTENCE_SIZE,\
infra.milvus.milvus.kafka.global.storageClass=$STORAGE_CLASS,\
infra.milvus.milvus.minio.global.storageClass=$STORAGE_CLASS,\
infra.milvus.milvus.minio.persistence.storageClass=$STORAGE_CLASS

Checking the Deployment Status of the Infrastructure Components

Check the status of all the infrastructure components by running this command:

kubectl -n aizen-infra get pods

If any of the components are not in a Running state, see Installation Issues.

Getting Credentials for the Aizen Microservice Images