Coursera

Week 4

Serving

Building models is just a small part of ML. A production solution requires so much more.

Data ingestion -> Data validation -> Data transform -> Model training -> Model analysis -> Productin Model -> Model serving.

Centralizing model on a server (or servers with a load balancer server - cloud-based).

TensorFlow serving is a part of TFX.

Installing TensorFlow Serving

Docker
APT
Build from Source
pip packages

Setup for serving

tf.saved_model.simple_save(
 keras.backend.get_session(),
 export_path,
 inputs={'input_image': model.input},
 outputs={t.name: t for t in model.outputs}
)

!saved_model_cli show --dir {export_path} --all

Serving

os.environ["MODEL_DIR"] = MODEL_DIR


%%bash --bg
nohup tensorflow_model_server \
 --rest_api_port=8501 \
 --model_name=helloworld \
 --model_base_path="${MODEL_DIR}" > server.log 2>&1

(Outputs are sent to server.log)

Passing data to serving

Using json to parse data to a tensor-like datashape.

import json

xs = np.array([[9.0], [10.0]])
data = json.dumps({
 "signature_name": "serving_default",
 "instances": xs.tolist(),
})

print(data)

Getting the predictions back

!pip install -q requests

import requests

headers = {"content-type": "application/json"}

json_response = requests.post(
 "http://localhost:8501/v1/models/helloworld:predict",
 data=data,
 headers=headers
)

print(json_response.text)

predictions = json.loads(json_response.text)["predictions"]

Quiz

Question	Answer
1. What’s the name of the package you install to get TensorFlow Serving?	tensorflow-model-server
2. What Unix command is used to start TensorFlow serving in a way that will run it and continue running even if the session is disconnected?	nohup
3. What’s the name of the production-scale ML platform for TensorFlow?	TFX
4. What advantages do you get by running inference on a server instead of distributing the model to all your clients?	All of the above
5. How do you prepare your model for serving?	Use TensorFlow SavedModel to save it, and then deploy it to the server
6. If you want to inspect the inputs and outputs for your model, what command do you use?	saved_model_cli
7. If you want to start the model server on port 8501, what parameter do you use?	–rest_api_port
8. I want to pass a list of values (i.e. 8, 9, 10) to the server and have it perform inferences on them, what’s the correct syntax for this data?	`[[8], [9], [10]]`
9. If I publish V1 a model called ‘helloworld’ and run it with a REST API on port 8501. What’s the URL of the endpoint used to run inference on localhost?	`http://localhost:8501/v1/models/helloworld:predict`
10. After running inference using a model hosted on TF Serving, the following is returned. Can you explain what data was sent to the model, and what these return values mean? `[ [5.77123615e-07, 2.66907847e-08, 4.7217938e-08, 1.97792871e-09, 5.31984341e-08, 0.00734644197, 3.1462946e-07, 0.0439051725, 0.000500570168, 0.948246837], [0.00227244, 6.12080342e-09, 0.967876315, 3.0579281e-06, 0.0183339939, 3.18483538e-11, 0.011510049, 1.38639566e-14, 4.19033222e-06, 4.40264526e-11], [1.45221502e-05, 0.999841571, 3.96758715e-08, 0.000131023204, 1.22008023e-05, 1.18227668e-08, 5.97860179e-08, 1.31281848e-08, 5.49047854e-07, 2.97885189e-10] ]`	You passed three items to a model that recognizes 10 classes, and it returned the probabilities for each item in each class