Adding Real-Time Data Sources

After you have trained and obtained an optimal ML model, you may want to deploy it to handle prediction requests. If your training dataset has contextual features from an events data sink, you must provide a real-time data source that corresponds to the historical data source configured for that data sink. A real-time data source may be a Kafka stream or a database table with a periodic batch read schedule. Connect the real-time data source to the data sink before deploying the ML model. This ensures that your contextual features for prediction are computed on fresh real-time data.

To add a real-time data source to an events data sink, follow these steps:

Log in to the Aizen Jupyter console. See Using the Aizen Jupyter Console.
Set the current working project.
```
set project <project name>
```
Configure your real-time data source by running the configure datasource command:
```
configure datasource
```
In the notebook, you will be guided through a template form with boxes and drop-down lists that you can complete to configure the data source.
Add the real-time data source to the data sink using the alter datasink command. This command will schedule a job to periodically and continuously fetch data from the real-time data source into the data sink. Optionally, you may configure resources for the job by running the configure resource command. If you do not configure resources, default resource settings will be applied.
```
configure resource
alter datasink <datasink name>
```
Check the health of your data sink by running the status datasink command:
```
status datasink <datasink name>
```

PreviousTraining an ML Model NextServing an ML Model

Last updated 8 months ago