Adding Real-Time Data Sources
After you have trained and obtained an optimal ML model, you may want to deploy it to handle prediction requests. If your training dataset has contextual features from an events data sink, you must provide a real-time data source that corresponds to the historical data source configured for that data sink. A real-time data source may be a Kafka stream or a database table with a periodic batch read schedule. Connect the real-time data source to the data sink before deploying the ML model. This ensures that your contextual features for prediction are computed on fresh real-time data.
To add a real-time data source to an events data sink, follow these steps:
Log in to the Aizen Jupyter console. See Using the Aizen Jupyter Console.
Set the current working project.
Configure your real-time data source by running the
configure datasource
command:In the notebook, you will be guided through a template form with boxes and drop-down lists that you can complete to configure the data source.
Add the real-time data source to the data sink using the
alter datasink
command. This command will schedule a job to periodically and continuously fetch data from the real-time data source into the data sink. Optionally, you may configure resources for the job by running theconfigure resource
command. If you do not configure resources, default resource settings will be applied.Check the health of your data sink by running the
status datasink
command:
Last updated