Coursera

Copyright 2019 The TensorFlow Authors.

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Getting Started with TensorFlow Serving

In this notebook you will serve your first TensorFlow model with TensorFlow Serving. We will start by building a very simple model to infer the relationship:

$$ y = 2x - 1 $$

between a few pairs of numbers. After training our model, we will serve it with TensorFlow Serving, and then we will make inference requests.

Warning: This notebook is designed to be run in a Google Colab only. It installs packages on the system and requires root access. If you want to run it in a local Jupyter notebook, please proceed with caution.

Setup

try:
    %tensorflow_version 2.x
except:
    pass

Colab only includes TensorFlow 2.x; %tensorflow_version has no effect.

import os
import json
import tempfile
import requests
import numpy as np

import tensorflow as tf

print("\u2022 Using TensorFlow Version:", tf.__version__)

• Using TensorFlow Version: 2.14.0

Add TensorFlow Serving Distribution URI as a Package Source

We will install TensorFlow Serving using Aptitude (the default Debian package manager) since Google’s Colab runs in a Debian environment.

Before we can install TensorFlow Serving, we need to add the tensorflow-model-server package to the list of packages that Aptitude knows about. Note that we’re running as root.

Note: This notebook is running TensorFlow Serving natively, but you can also run it in a Docker container, which is one of the easiest ways to get started using TensorFlow Serving. The Docker Engine is available for a variety of Linux platforms, Windows, and Mac.

# This is the same as you would do from your command line, but without the [arch=amd64], and no sudo
# You would instead do:
# echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && \
# curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -

!echo "deb http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | tee /etc/apt/sources.list.d/tensorflow-serving.list && \
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | apt-key add -
!apt update

deb http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
100  2943  100  2943    0     0   3674      0 --:--:-- --:--:-- --:--:--  3674
OK
Get:1 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
Get:2 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,626 B]
Get:3 https://ppa.launchpadcontent.net/c2d4u.team/c2d4u4.0+/ubuntu jammy InRelease [18.1 kB]
Hit:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Get:5 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ Packages [44.8 kB]
Hit:6 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:7 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:8 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Get:9 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB]
Get:10 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [1,009 kB]
Hit:11 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Get:12 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [1,131 kB]
Get:13 https://ppa.launchpadcontent.net/c2d4u.team/c2d4u4.0+/ubuntu jammy/main Sources [2,230 kB]
Get:14 https://ppa.launchpadcontent.net/c2d4u.team/c2d4u4.0+/ubuntu jammy/main amd64 Packages [1,145 kB]
Get:15 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [109 kB]
Get:16 http://storage.googleapis.com/tensorflow-serving-apt stable InRelease [3,026 B]
Get:17 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1,274 kB]
Get:18 http://storage.googleapis.com/tensorflow-serving-apt stable/tensorflow-model-server amd64 Packages [340 B]
Get:19 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [1,400 kB]
Get:20 http://storage.googleapis.com/tensorflow-serving-apt stable/tensorflow-model-server-universal amd64 Packages [348 B]
Fetched 8,597 kB in 2s (4,186 kB/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
29 packages can be upgraded. Run 'apt list --upgradable' to see them.
[1;33mW: [0mhttp://storage.googleapis.com/tensorflow-serving-apt/dists/stable/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.[0m

Install TensorFlow Serving

Now that the Aptitude packages have been updated, we can use the apt-get command to install the TensorFlow model server.

!apt-get install tensorflow-model-server

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  tensorflow-model-server
0 upgraded, 1 newly installed, 0 to remove and 29 not upgraded.
Need to get 463 MB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://storage.googleapis.com/tensorflow-serving-apt stable/tensorflow-model-server amd64 tensorflow-model-server all 2.14.0 [463 MB]
Fetched 463 MB in 19s (24.9 MB/s)
Selecting previously unselected package tensorflow-model-server.
(Reading database ... 120874 files and directories currently installed.)
Preparing to unpack .../tensorflow-model-server_2.14.0_all.deb ...
Unpacking tensorflow-model-server (2.14.0) ...
Setting up tensorflow-model-server (2.14.0) ...

Create Dataset

Now, we will create a simple dataset that expresses the relationship:

$$ y = 2x - 1 $$

between inputs (xs) and outputs (ys).

xs = np.array([-1.0,  0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)

Build and Train the Model

We’ll use the simplest possible model for this example. Since we are going to train our model for 500 epochs, in order to avoid clutter on the screen, we will use the argument verbose=0 in the fit method. The Verbosity mode can be:

0 : silent.
1 : progress bar.
2 : one line per epoch.

As a side note, we should mention that since the progress bar is not particularly useful when logged to a file, verbose=2 is recommended when not running interactively (eg, in a production environment).

model = tf.keras.Sequential([tf.keras.layers.Dense(units=1, input_shape=[1])])

model.compile(optimizer='sgd',
              loss='mean_squared_error')

history = model.fit(xs, ys, epochs=500, verbose=0)

print("Finished training the model")

Finished training the model

Test the Model

Now that the model is trained, we can test it. If we give it the value 10, we should get a value very close to 19.

print(model.predict([10.0]))

1/1 [==============================] - 0s 76ms/step
[[18.986599]]

Save the Model

To load the trained model into TensorFlow Serving we first need to save it in the SavedModel format. This will create a protobuf file in a well-defined directory hierarchy, and will include a version number. TensorFlow Serving allows us to select which version of a model, or “servable” we want to use when we make inference requests. Each version will be exported to a different sub-directory under the given path.

MODEL_DIR = tempfile.gettempdir()

version = 1

export_path = os.path.join(MODEL_DIR, str(version))

if os.path.isdir(export_path):
    print('\nAlready saved a model, cleaning up\n')
    !rm -r {export_path}

model.save(export_path, save_format="tf")

print('\nexport_path = {}'.format(export_path))
!ls -l {export_path}

export_path = /tmp/1
total 60
drwxr-xr-x 2 root root  4096 Oct 27 15:10 assets
-rw-r--r-- 1 root root    58 Oct 27 15:10 fingerprint.pb
-rw-r--r-- 1 root root  4421 Oct 27 15:10 keras_metadata.pb
-rw-r--r-- 1 root root 40880 Oct 27 15:10 saved_model.pb
drwxr-xr-x 2 root root  4096 Oct 27 15:10 variables

Examine Your Saved Model

We’ll use the command line utility saved_model_cli to look at the MetaGraphDefs and SignatureDefs in our SavedModel. The signature definition is defined by the input and output tensors, and stored with the default serving key.

!saved_model_cli show --dir {export_path} --all

2023-10-27 15:10:51.936840: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-10-27 15:10:51.936895: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-10-27 15:10:51.936936: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-10-27 15:10:53.548893: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['dense_input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1)
        name: serving_default_dense_input:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['dense'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1)
        name: StatefulPartitionedCall:0
  Method name is: tensorflow/serving/predict
The MetaGraph with tag set ['serve'] contains the following ops: {'ReadVariableOp', 'MergeV2Checkpoints', 'Select', 'Pack', 'Placeholder', 'Const', 'SaveV2', 'VarHandleOp', 'StringJoin', 'NoOp', 'DisableCopyOnRead', 'Identity', 'ShardedFilename', 'StatefulPartitionedCall', 'AssignVariableOp', 'RestoreV2', 'MatMul', 'BiasAdd', 'StaticRegexFullMatch'}
2023-10-27 15:10:56.003634: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.

Concrete Functions:
  Function Name: '__call__'
    Option #1
      Callable with:
        Argument #1
          dense_input: TensorSpec(shape=(None, 1), dtype=tf.float32, name='dense_input')
        Argument #2
          DType: bool
          Value: True
        Argument #3
          DType: NoneType
          Value: None
    Option #2
      Callable with:
        Argument #1
          dense_input: TensorSpec(shape=(None, 1), dtype=tf.float32, name='dense_input')
        Argument #2
          DType: bool
          Value: False
        Argument #3
          DType: NoneType
          Value: None

  Function Name: '_default_save_signature'
    Option #1
      Callable with:
        Argument #1
          dense_input: TensorSpec(shape=(None, 1), dtype=tf.float32, name='dense_input')

  Function Name: 'call_and_return_all_conditional_losses'
    Option #1
      Callable with:
        Argument #1
          dense_input: TensorSpec(shape=(None, 1), dtype=tf.float32, name='dense_input')
        Argument #2
          DType: bool
          Value: True
        Argument #3
          DType: NoneType
          Value: None
    Option #2
      Callable with:
        Argument #1
          dense_input: TensorSpec(shape=(None, 1), dtype=tf.float32, name='dense_input')
        Argument #2
          DType: bool
          Value: False
        Argument #3
          DType: NoneType
          Value: None

Run the TensorFlow Model Server

We will now launch the TensorFlow model server with a bash script. We will use the argument --bg to run the script in the background.

Our script will start running TensorFlow Serving and will load our model. Here are the parameters we will use:

rest_api_port: The port that you’ll use for requests.
model_name: You’ll use this in the URL of your requests. It can be anything.
model_base_path: This is the path to the directory where you’ve saved your model.

Also, because the variable that points to the directory containing the model is in Python, we need a way to tell the bash script where to find the model. To do this, we will write the value of the Python variable to an environment variable using the os.environ function.

os.environ["MODEL_DIR"] = MODEL_DIR

%%bash --bg
nohup tensorflow_model_server \
  --rest_api_port=8501 \
  --model_name=helloworld \
  --model_base_path="${MODEL_DIR}" >server.log 2>&1

Now we can take a look at the server log.

!tail server.log

Create JSON Object with Test Data

We are now ready to construct a JSON object with some data so that we can make a couple of inferences. We will use $x=9$ and $x=10$ as our test data.

xs = np.array([[9.0], [10.0]])
data = json.dumps({"signature_name": "serving_default", "instances": xs.tolist()})
print(data)

{"signature_name": "serving_default", "instances": [[9.0], [10.0]]}

Make Inference Request

Finally, we can make the inference request and get the inferences back. We’ll send a predict request as a POST to our server’s REST endpoint, and pass it our test data. We’ll ask our server to give us the latest version of our model by not specifying a particular version. The response will be a JSON payload containing the predictions.

# if this cell fails execution because of an "...Failed to establish a new connection..." error,
# try replacing in the link below 'localhost' with '127.0.0.1'

headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/helloworld:predict', data=data, headers=headers)

print(json_response.text)

---------------------------------------------------------------------------

ConnectionRefusedError                    Traceback (most recent call last)

/usr/local/lib/python3.10/dist-packages/urllib3/connection.py in _new_conn(self)
    202         try:
--> 203             sock = connection.create_connection(
    204                 (self._dns_host, self.port),


/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options)
     84         try:
---> 85             raise err
     86         finally:


/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options)
     72                 sock.bind(source_address)
---> 73             sock.connect(sa)
     74             # Break explicitly a reference cycle


ConnectionRefusedError: [Errno 111] Connection refused


The above exception was the direct cause of the following exception:


NewConnectionError                        Traceback (most recent call last)

/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw)
    790             # Make the request on the HTTPConnection object
--> 791             response = self._make_request(
    792                 conn,


/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length)
    496         try:
--> 497             conn.request(
    498                 method,


/usr/local/lib/python3.10/dist-packages/urllib3/connection.py in request(self, method, url, body, headers, chunked, preload_content, decode_content, enforce_content_length)
    394             self.putheader(header, value)
--> 395         self.endheaders()
    396 


/usr/lib/python3.10/http/client.py in endheaders(self, message_body, encode_chunked)
   1277             raise CannotSendHeader()
-> 1278         self._send_output(message_body, encode_chunked=encode_chunked)
   1279 


/usr/lib/python3.10/http/client.py in _send_output(self, message_body, encode_chunked)
   1037         del self._buffer[:]
-> 1038         self.send(msg)
   1039 


/usr/lib/python3.10/http/client.py in send(self, data)
    975             if self.auto_open:
--> 976                 self.connect()
    977             else:


/usr/local/lib/python3.10/dist-packages/urllib3/connection.py in connect(self)
    242     def connect(self) -> None:
--> 243         self.sock = self._new_conn()
    244         if self._tunnel_host:


/usr/local/lib/python3.10/dist-packages/urllib3/connection.py in _new_conn(self)
    217         except OSError as e:
--> 218             raise NewConnectionError(
    219                 self, f"Failed to establish a new connection: {e}"


NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7cf7d41084f0>: Failed to establish a new connection: [Errno 111] Connection refused


The above exception was the direct cause of the following exception:


MaxRetryError                             Traceback (most recent call last)

/usr/local/lib/python3.10/dist-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    485         try:
--> 486             resp = conn.urlopen(
    487                 method=request.method,


/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw)
    844 
--> 845             retries = retries.increment(
    846                 method, url, error=new_e, _pool=self, _stacktrace=sys.exc_info()[2]


/usr/local/lib/python3.10/dist-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
    514             reason = error or ResponseError(cause)
--> 515             raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
    516 


MaxRetryError: HTTPConnectionPool(host='localhost', port=8501): Max retries exceeded with url: /v1/models/helloworld:predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7cf7d41084f0>: Failed to establish a new connection: [Errno 111] Connection refused'))


During handling of the above exception, another exception occurred:


ConnectionError                           Traceback (most recent call last)

<ipython-input-15-4f595b169d73> in <cell line: 5>()
      3 
      4 headers = {"content-type": "application/json"}
----> 5 json_response = requests.post('http://localhost:8501/v1/models/helloworld:predict', data=data, headers=headers)
      6 
      7 print(json_response.text)


/usr/local/lib/python3.10/dist-packages/requests/api.py in post(url, data, json, **kwargs)
    113     """
    114 
--> 115     return request("post", url, data=data, json=json, **kwargs)
    116 
    117 


/usr/local/lib/python3.10/dist-packages/requests/api.py in request(method, url, **kwargs)
     57     # cases, and look like a memory leak in others.
     58     with sessions.Session() as session:
---> 59         return session.request(method=method, url=url, **kwargs)
     60 
     61 


/usr/local/lib/python3.10/dist-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    587         }
    588         send_kwargs.update(settings)
--> 589         resp = self.send(prep, **send_kwargs)
    590 
    591         return resp


/usr/local/lib/python3.10/dist-packages/requests/sessions.py in send(self, request, **kwargs)
    701 
    702         # Send the request
--> 703         r = adapter.send(request, **kwargs)
    704 
    705         # Total elapsed time of the request (approximately)


/usr/local/lib/python3.10/dist-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    517                 raise SSLError(e, request=request)
    518 
--> 519             raise ConnectionError(e, request=request)
    520 
    521         except ClosedPoolError as e:


ConnectionError: HTTPConnectionPool(host='localhost', port=8501): Max retries exceeded with url: /v1/models/helloworld:predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7cf7d41084f0>: Failed to establish a new connection: [Errno 111] Connection refused'))

We can also look at the predictions directly by loading the value for the predictions key.

predictions = json.loads(json_response.text)['predictions']
print(predictions)