#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
In this notebook you will serve your first TensorFlow model with TensorFlow Serving. We will start by building a very simple model to infer the relationship:
$$ y = 2x - 1 $$
between a few pairs of numbers. After training our model, we will serve it with TensorFlow Serving, and then we will make inference requests.
Warning: This notebook is designed to be run in a Google Colab only. It installs packages on the system and requires root access. If you want to run it in a local Jupyter notebook, please proceed with caution.
try:
%tensorflow_version 2.x
except:
pass
Colab only includes TensorFlow 2.x; %tensorflow_version has no effect.
import os
import json
import tempfile
import requests
import numpy as np
import tensorflow as tf
print("\u2022 Using TensorFlow Version:", tf.__version__)
• Using TensorFlow Version: 2.14.0
We will install TensorFlow Serving using Aptitude (the default Debian package manager) since Google’s Colab runs in a Debian environment.
Before we can install TensorFlow Serving, we need to add the tensorflow-model-server
package to the list of packages that Aptitude knows about. Note that we’re running as root.
Note: This notebook is running TensorFlow Serving natively, but you can also run it in a Docker container, which is one of the easiest ways to get started using TensorFlow Serving. The Docker Engine is available for a variety of Linux platforms, Windows, and Mac.
# This is the same as you would do from your command line, but without the [arch=amd64], and no sudo
# You would instead do:
# echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && \
# curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -
!echo "deb http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | tee /etc/apt/sources.list.d/tensorflow-serving.list && \
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | apt-key add -
!apt update
deb http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
100 2943 100 2943 0 0 3674 0 --:--:-- --:--:-- --:--:-- 3674
OK
Get:1 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
Get:2 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,626 B]
Get:3 https://ppa.launchpadcontent.net/c2d4u.team/c2d4u4.0+/ubuntu jammy InRelease [18.1 kB]
Hit:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64 InRelease
Get:5 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ Packages [44.8 kB]
Hit:6 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:7 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:8 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Get:9 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB]
Get:10 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [1,009 kB]
Hit:11 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Get:12 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [1,131 kB]
Get:13 https://ppa.launchpadcontent.net/c2d4u.team/c2d4u4.0+/ubuntu jammy/main Sources [2,230 kB]
Get:14 https://ppa.launchpadcontent.net/c2d4u.team/c2d4u4.0+/ubuntu jammy/main amd64 Packages [1,145 kB]
Get:15 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [109 kB]
Get:16 http://storage.googleapis.com/tensorflow-serving-apt stable InRelease [3,026 B]
Get:17 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1,274 kB]
Get:18 http://storage.googleapis.com/tensorflow-serving-apt stable/tensorflow-model-server amd64 Packages [340 B]
Get:19 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [1,400 kB]
Get:20 http://storage.googleapis.com/tensorflow-serving-apt stable/tensorflow-model-server-universal amd64 Packages [348 B]
Fetched 8,597 kB in 2s (4,186 kB/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
29 packages can be upgraded. Run 'apt list --upgradable' to see them.
[1;33mW: [0mhttp://storage.googleapis.com/tensorflow-serving-apt/dists/stable/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.[0m
Now that the Aptitude packages have been updated, we can use the apt-get
command to install the TensorFlow model server.
!apt-get install tensorflow-model-server
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
tensorflow-model-server
0 upgraded, 1 newly installed, 0 to remove and 29 not upgraded.
Need to get 463 MB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://storage.googleapis.com/tensorflow-serving-apt stable/tensorflow-model-server amd64 tensorflow-model-server all 2.14.0 [463 MB]
Fetched 463 MB in 19s (24.9 MB/s)
Selecting previously unselected package tensorflow-model-server.
(Reading database ... 120874 files and directories currently installed.)
Preparing to unpack .../tensorflow-model-server_2.14.0_all.deb ...
Unpacking tensorflow-model-server (2.14.0) ...
Setting up tensorflow-model-server (2.14.0) ...
Now, we will create a simple dataset that expresses the relationship:
$$ y = 2x - 1 $$
between inputs (xs
) and outputs (ys
).
xs = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)
We’ll use the simplest possible model for this example. Since we are going to train our model for 500
epochs, in order to avoid clutter on the screen, we will use the argument verbose=0
in the fit
method. The Verbosity mode can be:
0
: silent.
1
: progress bar.
2
: one line per epoch.
As a side note, we should mention that since the progress bar is not particularly useful when logged to a file, verbose=2
is recommended when not running interactively (eg, in a production environment).
model = tf.keras.Sequential([tf.keras.layers.Dense(units=1, input_shape=[1])])
model.compile(optimizer='sgd',
loss='mean_squared_error')
history = model.fit(xs, ys, epochs=500, verbose=0)
print("Finished training the model")
Finished training the model
Now that the model is trained, we can test it. If we give it the value 10
, we should get a value very close to 19
.
print(model.predict([10.0]))
1/1 [==============================] - 0s 76ms/step
[[18.986599]]
To load the trained model into TensorFlow Serving we first need to save it in the SavedModel format. This will create a protobuf file in a well-defined directory hierarchy, and will include a version number. TensorFlow Serving allows us to select which version of a model, or “servable” we want to use when we make inference requests. Each version will be exported to a different sub-directory under the given path.
MODEL_DIR = tempfile.gettempdir()
version = 1
export_path = os.path.join(MODEL_DIR, str(version))
if os.path.isdir(export_path):
print('\nAlready saved a model, cleaning up\n')
!rm -r {export_path}
model.save(export_path, save_format="tf")
print('\nexport_path = {}'.format(export_path))
!ls -l {export_path}
export_path = /tmp/1
total 60
drwxr-xr-x 2 root root 4096 Oct 27 15:10 assets
-rw-r--r-- 1 root root 58 Oct 27 15:10 fingerprint.pb
-rw-r--r-- 1 root root 4421 Oct 27 15:10 keras_metadata.pb
-rw-r--r-- 1 root root 40880 Oct 27 15:10 saved_model.pb
drwxr-xr-x 2 root root 4096 Oct 27 15:10 variables
We’ll use the command line utility saved_model_cli
to look at the MetaGraphDefs
and SignatureDefs
in our SavedModel. The signature definition is defined by the input and output tensors, and stored with the default serving key.
!saved_model_cli show --dir {export_path} --all
2023-10-27 15:10:51.936840: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-10-27 15:10:51.936895: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-10-27 15:10:51.936936: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-10-27 15:10:53.548893: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
signature_def['__saved_model_init_op']:
The given SavedModel SignatureDef contains the following input(s):
The given SavedModel SignatureDef contains the following output(s):
outputs['__saved_model_init_op'] tensor_info:
dtype: DT_INVALID
shape: unknown_rank
name: NoOp
Method name is:
signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['dense_input'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: serving_default_dense_input:0
The given SavedModel SignatureDef contains the following output(s):
outputs['dense'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict
The MetaGraph with tag set ['serve'] contains the following ops: {'ReadVariableOp', 'MergeV2Checkpoints', 'Select', 'Pack', 'Placeholder', 'Const', 'SaveV2', 'VarHandleOp', 'StringJoin', 'NoOp', 'DisableCopyOnRead', 'Identity', 'ShardedFilename', 'StatefulPartitionedCall', 'AssignVariableOp', 'RestoreV2', 'MatMul', 'BiasAdd', 'StaticRegexFullMatch'}
2023-10-27 15:10:56.003634: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
Concrete Functions:
Function Name: '__call__'
Option #1
Callable with:
Argument #1
dense_input: TensorSpec(shape=(None, 1), dtype=tf.float32, name='dense_input')
Argument #2
DType: bool
Value: True
Argument #3
DType: NoneType
Value: None
Option #2
Callable with:
Argument #1
dense_input: TensorSpec(shape=(None, 1), dtype=tf.float32, name='dense_input')
Argument #2
DType: bool
Value: False
Argument #3
DType: NoneType
Value: None
Function Name: '_default_save_signature'
Option #1
Callable with:
Argument #1
dense_input: TensorSpec(shape=(None, 1), dtype=tf.float32, name='dense_input')
Function Name: 'call_and_return_all_conditional_losses'
Option #1
Callable with:
Argument #1
dense_input: TensorSpec(shape=(None, 1), dtype=tf.float32, name='dense_input')
Argument #2
DType: bool
Value: True
Argument #3
DType: NoneType
Value: None
Option #2
Callable with:
Argument #1
dense_input: TensorSpec(shape=(None, 1), dtype=tf.float32, name='dense_input')
Argument #2
DType: bool
Value: False
Argument #3
DType: NoneType
Value: None
We will now launch the TensorFlow model server with a bash script. We will use the argument --bg
to run the script in the background.
Our script will start running TensorFlow Serving and will load our model. Here are the parameters we will use:
rest_api_port
: The port that you’ll use for requests.
model_name
: You’ll use this in the URL of your requests. It can be anything.
model_base_path
: This is the path to the directory where you’ve saved your model.
Also, because the variable that points to the directory containing the model is in Python, we need a way to tell the bash script where to find the model. To do this, we will write the value of the Python variable to an environment variable using the os.environ
function.
os.environ["MODEL_DIR"] = MODEL_DIR
%%bash --bg
nohup tensorflow_model_server \
--rest_api_port=8501 \
--model_name=helloworld \
--model_base_path="${MODEL_DIR}" >server.log 2>&1
Now we can take a look at the server log.
!tail server.log
We are now ready to construct a JSON object with some data so that we can make a couple of inferences. We will use $x=9$ and $x=10$ as our test data.
xs = np.array([[9.0], [10.0]])
data = json.dumps({"signature_name": "serving_default", "instances": xs.tolist()})
print(data)
{"signature_name": "serving_default", "instances": [[9.0], [10.0]]}
Finally, we can make the inference request and get the inferences back. We’ll send a predict request as a POST to our server’s REST endpoint, and pass it our test data. We’ll ask our server to give us the latest version of our model by not specifying a particular version. The response will be a JSON payload containing the predictions.
# if this cell fails execution because of an "...Failed to establish a new connection..." error,
# try replacing in the link below 'localhost' with '127.0.0.1'
headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/helloworld:predict', data=data, headers=headers)
print(json_response.text)
---------------------------------------------------------------------------
ConnectionRefusedError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/urllib3/connection.py in _new_conn(self)
202 try:
--> 203 sock = connection.create_connection(
204 (self._dns_host, self.port),
/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options)
84 try:
---> 85 raise err
86 finally:
/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options)
72 sock.bind(source_address)
---> 73 sock.connect(sa)
74 # Break explicitly a reference cycle
ConnectionRefusedError: [Errno 111] Connection refused
The above exception was the direct cause of the following exception:
NewConnectionError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw)
790 # Make the request on the HTTPConnection object
--> 791 response = self._make_request(
792 conn,
/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length)
496 try:
--> 497 conn.request(
498 method,
/usr/local/lib/python3.10/dist-packages/urllib3/connection.py in request(self, method, url, body, headers, chunked, preload_content, decode_content, enforce_content_length)
394 self.putheader(header, value)
--> 395 self.endheaders()
396
/usr/lib/python3.10/http/client.py in endheaders(self, message_body, encode_chunked)
1277 raise CannotSendHeader()
-> 1278 self._send_output(message_body, encode_chunked=encode_chunked)
1279
/usr/lib/python3.10/http/client.py in _send_output(self, message_body, encode_chunked)
1037 del self._buffer[:]
-> 1038 self.send(msg)
1039
/usr/lib/python3.10/http/client.py in send(self, data)
975 if self.auto_open:
--> 976 self.connect()
977 else:
/usr/local/lib/python3.10/dist-packages/urllib3/connection.py in connect(self)
242 def connect(self) -> None:
--> 243 self.sock = self._new_conn()
244 if self._tunnel_host:
/usr/local/lib/python3.10/dist-packages/urllib3/connection.py in _new_conn(self)
217 except OSError as e:
--> 218 raise NewConnectionError(
219 self, f"Failed to establish a new connection: {e}"
NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7cf7d41084f0>: Failed to establish a new connection: [Errno 111] Connection refused
The above exception was the direct cause of the following exception:
MaxRetryError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
485 try:
--> 486 resp = conn.urlopen(
487 method=request.method,
/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw)
844
--> 845 retries = retries.increment(
846 method, url, error=new_e, _pool=self, _stacktrace=sys.exc_info()[2]
/usr/local/lib/python3.10/dist-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
514 reason = error or ResponseError(cause)
--> 515 raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
516
MaxRetryError: HTTPConnectionPool(host='localhost', port=8501): Max retries exceeded with url: /v1/models/helloworld:predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7cf7d41084f0>: Failed to establish a new connection: [Errno 111] Connection refused'))
During handling of the above exception, another exception occurred:
ConnectionError Traceback (most recent call last)
<ipython-input-15-4f595b169d73> in <cell line: 5>()
3
4 headers = {"content-type": "application/json"}
----> 5 json_response = requests.post('http://localhost:8501/v1/models/helloworld:predict', data=data, headers=headers)
6
7 print(json_response.text)
/usr/local/lib/python3.10/dist-packages/requests/api.py in post(url, data, json, **kwargs)
113 """
114
--> 115 return request("post", url, data=data, json=json, **kwargs)
116
117
/usr/local/lib/python3.10/dist-packages/requests/api.py in request(method, url, **kwargs)
57 # cases, and look like a memory leak in others.
58 with sessions.Session() as session:
---> 59 return session.request(method=method, url=url, **kwargs)
60
61
/usr/local/lib/python3.10/dist-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
587 }
588 send_kwargs.update(settings)
--> 589 resp = self.send(prep, **send_kwargs)
590
591 return resp
/usr/local/lib/python3.10/dist-packages/requests/sessions.py in send(self, request, **kwargs)
701
702 # Send the request
--> 703 r = adapter.send(request, **kwargs)
704
705 # Total elapsed time of the request (approximately)
/usr/local/lib/python3.10/dist-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
517 raise SSLError(e, request=request)
518
--> 519 raise ConnectionError(e, request=request)
520
521 except ClosedPoolError as e:
ConnectionError: HTTPConnectionPool(host='localhost', port=8501): Max retries exceeded with url: /v1/models/helloworld:predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7cf7d41084f0>: Failed to establish a new connection: [Errno 111] Connection refused'))
We can also look at the predictions directly by loading the value for the predictions
key.
predictions = json.loads(json_response.text)['predictions']
print(predictions)