Installing the Infrastructure Components
Before proceeding with these steps, make sure that you have the required software installed. See the Software Requirements.
Aizen uses these open-source components as dependencies:
Apache Spark
Enables you to specify and run Spark applications on Kubernetes just as you would other workloads on Kubernetes. The Spark operator uses Kubernetes custom resources for specifying, running, and surfacing the status of Spark applications.
Kuberay
Simplifies the deployment and management of Ray applications on Kubernetes.
Apache Kafka
A distributed event store and stream processing platform.
Filecopy server
Provides the ability to distribute custom preprocessor files to all nodes in the cluster.
MLflow
Manages a machine learning (ML) lifecycle, including experiments, deployment, and a central model registry. The MLflow tracking lets you log and query experiments using Python, Java, or REST APIs. MLflow runs are recorded in the mysql database.
Multi-Cluster Application Dispatcher (MCAD)
Provides an abstraction for wrapping all resources of jobs. This controller queues the jobs, applies queueing policies, and then dispatches the jobs when cluster resources are available.
Monitoring components:
Prometheus operator
An operator that configures and manages a Prometheus monitoring stack that runs in a Kubernetes cluster.
Elastic search
A distributed search and analytics engine designed for handling large volumes of data. It is used for storing, searching, and analyzing structured and unstructured data in real time.
Fluentd
A data collector that unifies data collection from various data sources and log files.
To install the Aizen infrastructure components, follow these steps:
Create the namespace
aizen-infra
for the Aizen infrastructure components by running this command:kubectl create ns aizen-infra
Using the Docker Hub credentials that you obtained for the Aizen microservice images (see Getting Credentials for the Aizen Microservice Images), create a Kubernetes Secret for accessing the Aizen images in the infrastructure namespace:
kubectl create secret docker-registry aizenrepo-creds --docker-username=aizencorp --docker-password=<YOUR DOCKER CREDENTIALS> -n aizen-infra
Customize the settings (see Default Infrastructure Configuration Settings below) in the Aizen infrastructure Helm chart based on your environment and preferred configuration:
Update the STORAGE_CLASS, INGRESS_HOST, BUCKET_NAME, CLOUD_ENDPOINT_URL, CLOUD_ACCESSKEY_ID, CLOUD_SECRET_KEY, IMAGE_REPO_SECRET, MLFLOW_ACCESSKEY_ID, MLFLOW_SECRET_KEY, and MLFLOW_ENDPOINT_URL values.
For the MLFLOW_ENDPOINT_URL, use either an S3 bucket or a local MinIO bucket, but remember to create the bucket. If you are using local MinIO as an object store, make sure to set up MinIO first and then create buckets. See MinIO.
Change the default persistence size, which is hardcoded in the deployment of the various components, according to your environment.
For an Azure AKS cluster, add these properties to both infrastructure and core deployments: global.s3.azure.enabled=true
global.s3.azure.values.storage_account_name global.s3.azure.values.storage_access_key global.s3.azure.values.storage_connection_string global.mlflow.artifact.secrets.values.mlflow_endpoint_url=https://<your storage account name>.blob.core.windows.net global.mlflow.artifact.secrets.values.mlflow_artifacts_destination=wasbs://<storage containername>@<storage account name>.blob.core.windows.net/<destination folder>
For a cloud-based deployment, add these properties to both infrastructure and core deployments:
infra.hashicorp-vault.vault.server.hostAliases[0].ip="$CLOUD_ENDPOINT_IP",\ infra.hashicorp-vault.vault.server.hostAliases[0].hostnames[0]="<specify the cloud endpoint url without http>"
For a GCP cluster, check if istio-injection is enabled, and if enabled, disable istio-injection:
#Check if the gateway namespace has istio-injection enabled kubectl get ns -L istio-injection #Disable istio-injection if enabled kubectl label ns aizen-infra istio-injection-
Install the Aizen infrastructure components by running this command:
helm -n $NAMESPACE install aizen-infra $HELMCHART_LOCATION/aizen
For a GCP cluster, re-enable istio-injection using this command:
kubectl label ns aizen-infra istio-injection=enable
If you plan to use LDAP-based authentication, which is the default authentication type for the Aizen platform, you will need to install OpenLDAP and then create users. For instructions, see OpenLDAP.
Default Infrastructure Configuration Settings
NAMESPACE=aizen-infra
HELMCHART_LOCATION=aizen-helmcharts-1.0.0
STORAGE_CLASS=
INGRESS_HOST=
BUCKET_NAME=
CLUSTER_NAME=
CLOUD_ENDPOINT_URL=
CLOUD_ACCESSKEY_ID=
CLOUD_SECRET_KEY=
CLOUD_PROVIDER_REGION=
#Needed only for cloudian
CLOUD_ENDPOINT_IP=
LDAP_ENABLED=true
VECTORDB_ENABLED=false
#IMAGE
IMAGE_REPO=aizencorp
IMAGE_REPO_SECRET=
IMAGE_TAG=1.0.0
#MLFLOW
MLFLOW_ACCESSKEY_ID=
MLFLOW_SECRET_KEY=
MLFLOW_ENDPOINT_URL=
MLFLOW_ARTIFACT_DESTINATION=s3://
MLFLOW_ARTIFACT_REGION=
#PVC
KAFKA_PERSISTENCE_SIZE=25Gi
KAFKA_BROKER_PERSISTENCE_SIZE=25Gi
KAFKA_ZOOKEEPER_PERSISTENCE_SIZE=25Gi
MLFLOW_MYSQL_PERSISTENCE_SIZE=200Gi
PROMETHEUS_PERSISTENCE_SIZE=55Gi
GRAFANA_PERSISTENCE_SIZE=20Gi
ELASTIC_SEARCH_LOG_SIZE=55Gi
VECTORDB_PERSISTENCE_SIZE=25Gi
VAULT_PERSISTENCE_SIZE=25Gi
MILVUS_MINIO_PERSISTENCE_SIZE=100Gi
MILVUS_ETCD_PERSISTENCE_SIZE=50Gi
MILVUS_ETCD_REPLICAS=1
#You don't need to change anything below this line
helm -n $NAMESPACE install aizen-infra $HELMCHART_LOCATION/aizen \
--set infra.enabled=true,\
infra.kafka.kafka.global.storageClass=$STORAGE_CLASS,\
infra.kafka.kafka.zookeeper.persistence.size=$KAFKA_ZOOKEEPER_PERSISTENCE_SIZE,\
infra.kafka.kafka.broker.size=$KAFKA_BROKER_PERSISTENCE_SIZE,\
infra.kafka.kafka.persistence.size=$KAFKA_PERSISTENCE_SIZE,\
infra.prometheus-operator.kube-prometheus-stack.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=$STORAGE_CLASS,\
infra.prometheus-operator.kube-prometheus-stack.alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.storageClassName=$STORAGE_CLASS,\
infra.prometheus-operator.kube-prometheus-stack.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=$PROMETHEUS_PERSISTENCE_SIZE,\
infra.prometheus-operator.kube-prometheus-stack.grafana.persistence.size=$GRAFANA_PERSISTENCE_SIZE,\
infra.mlflow.mysql.primary.persistence.storageClass=$STORAGE_CLASS,\
infra.mlflow.mysql.primary.persistence.size=$MLFLOW_MYSQL_PERSISTENCE_SIZE,\
global.mlflow.artifact.region=$MLFLOW_ARTIFACT_REGION,\
global.mlflow.artifact.secrets.values.mlflow_access_key_id=$MLFLOW_ACCESSKEY_ID,\
global.mlflow.artifact.secrets.values.mlflow_access_secret_key=$MLFLOW_SECRET_KEY,\
global.mlflow.artifact.secrets.values.mlflow_endpoint_url=$MLFLOW_ENDPOINT_URL,\
global.mlflow.artifact.secrets.values.mlflow_artifacts_destination=$MLFLOW_ARTIFACT_DESTINATION,\
global.image_registry=$IMAGE_REPO,\
global.storage_class=$STORAGE_CLASS,\
global.image_secret=$IMAGE_REPO_SECRET,\
global.image_tag=$IMAGE_TAG,\
global.ingress.host=$INGRESS_HOST,\
global.log.volume_size=$ELASTIC_SEARCH_LOG_SIZE,\
global.clustername=$CLUSTER_NAME,\
global.s3.endpoint_url=$CLOUD_ENDPOINT_URL,\
global.s3.endpoint_ip=$CLOUD_ENDPOINT_IP,\
global.s3.secrets.values.s3_access_key=$CLOUD_ACCESSKEY_ID,\
global.s3.secrets.values.s3_secret_key=$CLOUD_SECRET_KEY,\
global.customer_bucket_name=$BUCKET_NAME,\
infra.openldap.enabled=$LDAP_ENABLED,\
infra.vectordb.enabled=$VECTORDB_ENABLED,\
infra.vectordb.vectordb.primary.persistence.size=$VECTORDB_PERSISTENCE_SIZE,\
infra.hashicorp-vault.vault.injector.enabled=false,\
infra.hashicorp-vault.vault.server.enabled=true,\
infra.hashicorp-vault.vault.server.standalone.enabled=true,\
infra.hashicorp-vault.vault.server.dataStorage.enabled=true,\
infra.hashicorp-vault.vault.server.dataStorage.storageClass=$STORAGE_CLASS,\
infra.hashicorp-vault.vault.server.dataStorage.size=$VAULT_PERSISTENCE_SIZE,\
infra.hashicorp-vault.vault.server.standalone.config='ui = true listener "tcp" { address = "[::]:8200" cluster_address = "[::]:8201" tls_disable = 1} storage "s3" { bucket = "'"$BUCKET_NAME"'" access_key = "'"$CLOUD_ACCESSKEY_ID"'" secret_key = "'"$CLOUD_SECRET_KEY"'" endpoint = "'"$CLOUD_ENDPOINT_URL"'" region = "'"$CLOUD_PROVIDER_REGION"'" s3_force_path_style = true path = "vault"}',\
infra.milvus.milvus.cluster.enabled=false,\
infra.milvus.milvus.standalone.enabled=true,\
nfra.milvus.milvus.pulsarv3.enabled=false,\
infra.milvus.milvus.pulsar.enabled=false,\
infra.milvus.milvus.etcd.replicaCount=$MILVUS_ETCD_REPLICAS,\
infra.milvus.milvus.minio.mode=standalone,\
infra.milvus.milvus.minio.persistence.size=$MILVUS_MINIO_PERSISTENCE_SIZE,\
infra.milvus.milvus.minio.address=$NAMESPACE-milvus-minio.$NAMESPACE.svc.cluster.local,\
infra.milvus.milvus.standalone.persistence.persistentVolumeClaim.storageClass=$STORAGE_CLASS,\
infra.milvus.milvus.etcd.global.storageClass=$STORAGE_CLASS,\
infra.milvus.milvus.etcd.persistence.size=$MILVUS_ETCD_PERSISTENCE_SIZE,\
infra.milvus.milvus.kafka.global.storageClass=$STORAGE_CLASS,\
infra.milvus.milvus.minio.global.storageClass=$STORAGE_CLASS,\
infra.milvus.milvus.minio.persistence.storageClass=$STORAGE_CLASS
Checking the Deployment Status of the Infrastructure Components
Check the status of all the infrastructure components by running this command:
kubectl -n aizen-infra get pods
If any of the components are not in a Running state, see Installation Issues.
Last updated