Web service¶
Palladium includes an HTTP service that can be used to make predictions over the web using models that were trained with the framework. There are two endpoints: /predict, that makes predictions, and /alive which provides a simple health status.
Predict¶
The /predict service uses HTTP query parameters to accept input features, and outputs a JSON response. The number and types of parameters depend on the application. An example is provided as part of the Tutorial.
On success, /predict will always return an HTTP status of 200. An error is indicated by either status 400 or 500, depending on whether the error was caused by malformed user input, or by an error on the server.
The PredictService
must be configured to
define what parameters and types are expected. Here is an example
configuration from the Tutorial:
'predict_service': {
'__factory__': 'palladium.server.PredictService',
'mapping': [
('sepal length', 'float'),
('sepal width', 'float'),
('petal length', 'float'),
('petal width', 'float'),
],
},
An example request might then look like this (assuming that you’re running a server locally on port 5000):
The usual output for a successful prediction has both a result
and
a metadata
entry. The metadata
provides the service name and
version as well as status information. An example:
{
"result": "Iris-virginica",
"metadata": {
"service_name": "iris",
"error_code": 0,
"status": "OK",
"service_version": "0.1"
}
}
An example that failed contains a status
set to ERROR
, an
error_code
and an error_message
. There is generally no
result
. Here is an example:
{
"metadata": {
"service_name": "iris",
"error_message": "BadRequest: ...",
"error_code": -1,
"status": "ERROR",
"service_version": "0.1"
}
}
It’s also possible to send a POST request instead of GET and predict for a number of samples at the same time. Say you want to predict for the class for two Iris examples, then your POST body might look like this:
[
{"sepal length": 6.3, "sepal width": 2.5, "petal length": 4.9, "petal width": 1.5},
{"sepal length": 5.3, "sepal width": 1.5, "petal length": 3.9, "petal width": 0.5}
]
The response will generally look the same, with the exception that now there’s a list of predictions that’s returned:
{
"result": ["Iris-virginica", "Iris-versicolor"],
"metadata": {
"service_name": "iris",
"error_code": 0,
"status": "OK",
"service_version": "0.1"
}
}
Should a different output format be desired than the one implemented
by PredictService
, it is possible to use a
different class altogether by setting an appropriate __factory__
(though that class will likely derive from
PredictService
for reasons of convenience).
A list of decorators may be configured such that they will be called
every time the /predict web service is called. To configure such a
decorator, that will act exactly as if it were used as a normal Python
decorator, use the predict_decorators
list setting. Here is an
example:
'predict_decorators': [
'my_package.my_predict_decorator',
],
Alive¶
The /alive service implements a simple health check. It’ll provide
information such as the palladium_version
in use, the current
memory_usage
by the web server process, and all metadata that has
been defined in the configuration under the service_metadata
entry. Here is an example for the Iris service:
{
"palladium_version": "0.6",
"service_metadata": {
"service_name": "iris",
"service_version": "0.1"
},
"memory_usage": 78,
"model": {
"updated": "2015-02-18T10:13:50.024478",
"metadata": {
"version": 2,
"train_timestamp": "2015-02-18T09:59:34.480063"
}
}
}
/alive can optionally check for the presence of data loaded into the
process’ cache (process_store
). That is because some scenarios
require the model and/or additional data to be loaded in memory before
they can answer requests efficiently
(cf. palladium.persistence.CachedUpdatePersister
and
palladium.dataset.ScheduledDatasetLoader
).
Say you expect the process_store
to be filled with a data
entry (because maybe you’re using
ScheduledDatasetLoader
) before you’re able to
answer requests. And you want /alive to return an error status (of
503) when that data hasn’t been loaded yet, then you’d add to your
configuration the following entry:
'alive': {
'process_store_required': ['data'],
},