Hands-On Artificial Intelligence on Google Cloud Platform: Build intelligent applications powered by TensorFlow, Cloud AutoML, BigQuery, and Dialogflow 1789538467, 9781789538465

Develop robust AI applications with TensorFlow, Cloud AutoML, TPUs, and other GCP services Key Features Focus on AI mode

360 122 20MB

English Pages 350 [341] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Hands-On Artificial Intelligence on Google Cloud Platform: Build intelligent applications powered by TensorFlow, Cloud AutoML, BigQuery, and Dialogflow
 1789538467, 9781789538465

Table of contents :
Cover
Title Page
About Packt
Copyright and Credits
Contributors
Table of Contents
Preface
Section 1: Basics of Google Cloud Platform
Chapter 1: Overview of AI and GCP
Understanding the Cloud First strategy for advanced data analytics
Advantages of a Cloud First strategy
Anti-patterns of the Cloud First strategy 
Google data centers
Overview of GCP
AI building blocks
Data
Storage
Processing
Actions
Natural language processing 
Speech recognition
Machine vision
Information processing and reasoning
Planning and exploring
Handling and control
Navigation and movement
Speech generation
Image generation
AI tools available on GCP
Sight
Language
Conversation
Summary
Chapter 2: Computing and Processing Using GCP Components
Understanding the compute options
Compute Engine
Compute Engine and AI applications
App Engine
App Engine and AI applications
Cloud Functions
Cloud Functions and AI applications
Kubernetes Engine
Kubernetes Engine and AI applications
Diving into the storage options
Cloud Storage
Cloud Storage and AI applications
Cloud Bigtable
Cloud Bigtable and AI applications
Cloud Datastore
Cloud Datastore and AI applications
Cloud Firestore
Cloud Firestore and AI applications
Cloud SQL
Cloud SQL and AI applications
Cloud Spanner
Cloud Spanner and AI applications
Cloud Memorystore
Cloud Memorystore and AI applications
Cloud Filestore
Cloud Filestore and AI applications
Understanding the processing options
BigQuery
BigQuery and AI applications
Cloud Dataproc
Cloud Dataproc and AI applications
Cloud Dataflow
Cloud Dataflow and AI applications
Building an ML pipeline 
Understanding the flow design
Loading data into Cloud Storage
Loading data to BigQuery
Training the model
Evaluating the model
Testing the model
Summary
Section 2: Artificial Intelligence with Google Cloud Platform
Chapter 3: Machine Learning Applications with XGBoost
Overview of the XGBoost library
Ensemble learning
How does ensemble learning decide on the optimal predictive model?
Reducible errors – bias
Reducible errors – variance
Irreducible errors
Total error
Gradient boosting
eXtreme Gradient Boosting (XGBoost)
Training and storing XGBoost machine learning models
Using XGBoost trained models
Building a recommendation system using the XGBoost library
Creating and testing the XGBoost recommendation system model 
Summary
Chapter 4: Using Cloud AutoML
Overview of Cloud AutoML 
The workings of AutoML
AutoML API overview
REST source – pointing to model locations
REST source – for evaluating the model
REST source – the operations API
Document classification using AutoML Natural Language
The traditional machine learning approach for document classification
Document classification with AutoML
Navigating to the AutoML Natural Language interface
Creating the dataset
Labeling the training data
Training the model
Evaluating the model
The command line
Python
Java
Node.js
Using the model for predictions
The web interface
A REST API for model predictions
Python code for model predictions
Image classification using AutoML Vision APIs
Image classification steps with AutoML Vision 
Collecting training images
Creating a dataset
Labeling and uploading training images
Training the model
Evaluating the model
The command-line interface
Python code
Testing the model
Python code
Performing speech-to-text conversion using the Speech-to-Text API
Synchronous requests
Asynchronous requests
Streaming requests
Sentiment analysis using AutoML Natural Language APIs
Summary
Chapter 5: Building a Big Data Cloud Machine Learning Engine
Understanding ML
Understanding how to use Cloud Machine Learning Engine
Google Cloud AI Platform Notebooks
Google AI Platform deep learning images
Creating Google Platform AI Notebooks
Using Google Platform AI Notebooks
Automating AI Notebooks execution
Overview of the Keras framework 
Training your model using the Keras framework
Training your model using Google AI Platform
Asynchronous batch prediction using Cloud Machine Learning Engine
Real-time prediction using Cloud Machine Learning Engine
Summary
Chapter 6: Smart Conversational Applications Using DialogFlow
Introduction to DialogFlow
Understanding the building blocks of DialogFlow
Building a DialogFlow agent
Use cases supported by DialogFlow
Performing audio sentiment analysis using DialogFlow
Summary
Section 3: TensorFlow on Google Cloud Platform
Chapter 7: Understanding Cloud TPUs
Introducing Cloud TPUs and their organization
Advantages of using TPUs
Mapping of software and hardware architecture
Available TPU versions
Performance benefits of TPU v3 over TPU v2
Available TPU configurations
Software architecture
Best practices of model development using TPUs
Guiding principles for model development on a TPU
Training your model using TPUEstimator
Standard TensorFlow Estimator API
TPUEstimator programming model
TPUEstimator concepts
Converting from TensorFlow Estimator to TPUEstimator
Setting up TensorBoard for analyzing TPU performance
Performance guide
XLA compiler performance
Consequences of tiling
Fusion
Understanding preemptible TPUs
Steps for creating a preemptible TPU from the console
Preemptible TPU pricing
Preemptible TPU detection 
Summary
Chapter 8: Implementing TensorFlow Models Using Cloud ML Engine
Understanding the components of Cloud ML Engine
Training service
Using the built-in algorithms
Using a custom training application
Prediction service
Notebooks
Data Labeling Service
Deep learning containers
Steps involved in training and utilizing a TensorFlow model
Prerequisites
Creating a TensorFlow application and running it locally
Project structure recommendation
Training data
Packaging and deploying your training application in Cloud ML Engine
Choosing the right compute options for your training job
Choosing the hyperparameters for the training job
Monitoring your TensorFlow training model jobs
Summary
Chapter 9: Building Prediction Applications
Overview of machine-based intelligent predictions
Understanding the prediction process
Maintaining models and their versions
Taking a deep dive into saved models
SignatureDef in the TensorFlow SavedModel
TensorFlow SavedModel APIs
Deploying the models on GCP
Uploading saved models to a Google Cloud Storage bucket
Testing machine learning models
Deploying models and their version
Model training example
Performing prediction with service endpoints
Summary
Section 4: Building Applications and Upcoming Features
Chapter 10: Building an AI application
A step-by-step approach to developing AI applications
Problem classification 
Classification
Regression
Clustering
Optimization
Anomaly detection
Ranking
Data preparation
Data acquisition 
Data processing 
Problem modeling 
Validation and execution
Holdout
Cross-validation
Model evaluation parameters (metrics)
Classification metrics
Model deployment
Overview of the use case – automated invoice processing (AIP)
Designing AIP with AI platform tools on GCP
Performing optical character recognition using the Vision API
Storing the invoice with Cloud SQL
Creating a Cloud SQL instance
Setting up the database and tables
Enabling the Cloud SQL API 
Enabling the Cloud Functions API 
Creating a Cloud Function 
Providing the Cloud SQL Admin role
Validating the invoice with Cloud Functions
Scheduling the invoice for the payment queue (pub/sub)
Notifying the vendor and AP team about the payment completion
Creating conversational interface for AIP
Upcoming features
Summary
Other Books You May Enjoy
Index

Citation preview

Hands-On Artificial Intelligence on Google Cloud Platform

Build intelligent applications powered by TensorFlow, Cloud AutoML, BigQuery, and Dialogflow

Anand Deshpande Manish Kumar Vikram Chaudhari

BIRMINGHAM - MUMBAI

Packt.com

Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe? Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals Improve your learning with Skill Plans built especially for you Get a free eBook or video every month Fully searchable for easy access to vital information Copy and paste, print, and bookmark content Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Hands-On Artificial Intelligence on Google Cloud Platform Copyright © 2020 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. Acquisition Editor: Devika Battike Content Development Editor: Nazia Shaikh Senior Editor: Ayaan Hoda Technical Editor: Utkarsha S. Kadam Copy Editor: Safis Editing Language Support Editor: Storm Mann Project Coordinator: Aishwarya Mohan Proofreader: Safis Editing Indexer: Rekha Nair Production Designer: Joshua Misquitta First published: March 2020 Production reference: 1050320 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78953-846-5

www.packt.com

About the authors Anand Deshpande has over 19 years' experience with IT services and product development. He is currently working as Vice President of Advanced Analytics and Product Development at VSquare Systems Pvt. Ltd. (VSquare). He has developed a special interest in data science and an algorithmic approach to data management and analytics and co-authored a book entitled Artificial Intelligence for Big Data in May 2018. This book and anything worthwhile in my life is possible only with the blessings of my spiritual guru, my parents, in-laws and the unconditional love and support from my wife Mugdha and my daughters Devyani and Sharvari. Special thanks to my friends and coauthors of this book, Manish Kumar and Vikram Chaudhari. And lastly, I would like to thank Mr. Sunil Kakade and Mr. Ajay Deshmukh for mentoring and their support. Manish Kumar works as Director of Technology and Architecture at VSquare. He has over 13 years' experience in providing technology solutions to complex business problems. He has worked extensively on web application development, IoT, big data, cloud technologies, and blockchain. Aside from this book, Manish has co-authored three books (Mastering Hadoop 3, Artificial Intelligence for Big Data, and Building Streaming Applications with Apache Kafka). I would like to thank my parents, Dr. N.K. Singh and Dr. Rambha Singh, for their blessings. The time spent on this book has taken some precious time from my wife, Mrs. Swati Singh, and my adorable son, Lakshya Singh and I would like to thank them for their support throughout this time. And lastly, I would like to thank my friends and co-authors, Anand Deshpande and Vikram Chaudhari. Vikram Chaudhari works as Director of Data and Advanced Analytics at VSquare. He has over 10 years' IT experience. He is a certified AWS and Google Cloud Architect and has completed multiple implementations of data pipelines with Amazon Web Services and Google Cloud Platform. With implementation experience on multiple data pipelines across platforms, Vikram is instrumental in creating reusable components and accelerators that reduce costs and implementation time. I would like to thank my mother, Mrs. Jyoti Chaudhari for her encouragement and blessings, my wife Amruta Chaudhari for her continuous support and love, my precious little gem Aahana, as she had to compromise on some of her weekend plans when I was busy writing the book. Finally, I would like to thank my mentor, Anand Deshpande, who always guided me and helped me to get to the next level in my career and, Mr. Manish Kumar, who is the co-author and one of the best Architects I have worked with. Lastly, I would want to thank the wonderful Packt team for their wonderful support throughout the project.

About the reviewers Arvind Ravulavaru is a full stack architect and consultant with over 11 years' experience in software development and 2 years' experience in hardware and product development. For the last 5 years, he has been working extensively on JavaScript, both on the server side and the client side and, for the last few years, in IoT, AI, ML, and big data. Alexey Bokov is an experienced Azure architect and has been a Microsoft technical evangelist since 2011. He works closely with Microsoft's top-tier customers all around the world to develop applications based on the Azure cloud platform. Building cloud-based applications in challenging scenarios is his passion, along with helping the development community to upskill and learn new things through hands-on exercises and hacking. He's a long-time contributor to, and co-author and reviewer of, many Azure books, and, from time to time, speaks at Kubernetes events. Judy T Raj is a Google Certified Professional Cloud Architect with extensive experience across the three leading cloud platforms—AWS, Azure, and the GCP. She has co-authored a book on Google Cloud Platform with Packt Publishing. She has also worked with a wide range of technologies, such as data science, deep learning, computer vision, blockchain, big data, and IoT and has published many relevant academic papers. Currently employed as a Data and AI Engineer, she is a passionate coder, a machine learning practitioner, and a computer vision enthusiast.

Packt is searching for authors like you If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents Preface

1

Section 1: Section 1: Basics of Google Cloud Platform Chapter 1: Overview of AI and GCP Understanding the Cloud First strategy for advanced data analytics Advantages of a Cloud First strategy Anti-patterns of the Cloud First strategy

Google data centers Overview of GCP AI building blocks Data

Storage Processing Actions

Natural language processing Speech recognition Machine vision Information processing and reasoning Planning and exploring Handling and control Navigation and movement Speech generation Image generation

AI tools available on GCP Sight Language Conversation

Summary Chapter 2: Computing and Processing Using GCP Components Understanding the compute options Compute Engine

Compute Engine and AI applications

App Engine

App Engine and AI applications

Cloud Functions

Cloud Functions and AI applications

Kubernetes Engine

Kubernetes Engine and AI applications

Diving into the storage options

8 9 10 12 13 15 16 17 17 18 18 18 19 19 19 20 20 20 21 21 21 21 23 24 25 26 27 27 28 29 29 30 30 30 31 31

Table of Contents

Cloud Storage

Cloud Storage and AI applications

Cloud Bigtable

Cloud Bigtable and AI applications

Cloud Datastore

Cloud Datastore and AI applications

Cloud Firestore

Cloud Firestore and AI applications

Cloud SQL

Cloud SQL and AI applications

Cloud Spanner

Cloud Spanner and AI applications

Cloud Memorystore

Cloud Memorystore and AI applications

Cloud Filestore

Cloud Filestore and AI applications

Understanding the processing options BigQuery

BigQuery and AI applications

Cloud Dataproc

Cloud Dataproc and AI applications

Cloud Dataflow

Cloud Dataflow and AI applications

Building an ML pipeline

Understanding the flow design Loading data into Cloud Storage Loading data to BigQuery Training the model Evaluating the model Testing the model

Summary

32 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 40 41 41 42 42 43 43 44 45 46 48 49 50 51

Section 2: Section 2: Artificial Intelligence with Google Cloud Platform Chapter 3: Machine Learning Applications with XGBoost Overview of the XGBoost library Ensemble learning

How does ensemble learning decide on the optimal predictive model? Reducible errors – bias Reducible errors – variance Irreducible errors Total error

Gradient boosting eXtreme Gradient Boosting (XGBoost)

Training and storing XGBoost machine learning models Using XGBoost trained models [ ii ]

53 54 54 55 55 55 56 56 57 60 61 64

Table of Contents

Building a recommendation system using the XGBoost library Creating and testing the XGBoost recommendation system model

Summary Chapter 4: Using Cloud AutoML Overview of Cloud AutoML The workings of AutoML AutoML API overview

REST source – pointing to model locations REST source – for evaluating the model REST source – the operations API

Document classification using AutoML Natural Language

The traditional machine learning approach for document classification Document classification with AutoML Navigating to the AutoML Natural Language interface Creating the dataset Labeling the training data Training the model Evaluating the model The command line Python Java Node.js

Using the model for predictions

The web interface A REST API for model predictions Python code for model predictions

Image classification using AutoML Vision APIs Image classification steps with AutoML Vision Collecting training images Creating a dataset

Labeling and uploading training images Training the model Evaluating the model The command-line interface Python code

Testing the model Python code

Performing speech-to-text conversion using the Speech-to-Text API Synchronous requests Asynchronous requests Streaming requests

Sentiment analysis using AutoML Natural Language APIs Summary Chapter 5: Building a Big Data Cloud Machine Learning Engine Understanding ML Understanding how to use Cloud Machine Learning Engine Google Cloud AI Platform Notebooks

[ iii ]

68 70 71 72 72 74 75 75 77 78 78 79 79 80 81 82 83 84 85 86 86 88 88 89 89 90 90 91 91 91 93 96 98 100 100 101 102 103 104 111 111 113 117 118 119 120 121

Table of Contents

Google AI Platform deep learning images Creating Google Platform AI Notebooks Using Google Platform AI Notebooks Automating AI Notebooks execution

Overview of the Keras framework Training your model using the Keras framework Training your model using Google AI Platform Asynchronous batch prediction using Cloud Machine Learning Engine Real-time prediction using Cloud Machine Learning Engine Summary Chapter 6: Smart Conversational Applications Using DialogFlow Introduction to DialogFlow Understanding the building blocks of DialogFlow

Building a DialogFlow agent

Use cases supported by DialogFlow

Performing audio sentiment analysis using DialogFlow Summary

121 123 127 129 134 137 141

145 150 151 152 153 153 159 170 171 172

Section 3: Section 3: TensorFlow on Google Cloud Platform Chapter 7: Understanding Cloud TPUs Introducing Cloud TPUs and their organization Advantages of using TPUs

Mapping of software and hardware architecture Available TPU versions Performance benefits of TPU v3 over TPU v2 Available TPU configurations Software architecture

Best practices of model development using TPUs Guiding principles for model development on a TPU

Training your model using TPUEstimator

Standard TensorFlow Estimator API TPUEstimator programming model TPUEstimator concepts Converting from TensorFlow Estimator to TPUEstimator

Setting up TensorBoard for analyzing TPU performance Performance guide XLA compiler performance Consequences of tiling Fusion

Understanding preemptible TPUs

Steps for creating a preemptible TPU from the console Preemptible TPU pricing

[ iv ]

175 175 178 179 180 181 182 184 185 186 186 187 187 188 188 189 191 191 191 192 193 193 195

Table of Contents

Preemptible TPU detection

Summary Chapter 8: Implementing TensorFlow Models Using Cloud ML Engine Understanding the components of Cloud ML Engine Training service

Using the built-in algorithms Using a custom training application

Prediction service Notebooks Data Labeling Service Deep learning containers

Steps involved in training and utilizing a TensorFlow model Prerequisites Creating a TensorFlow application and running it locally Project structure recommendation Training data

Packaging and deploying your training application in Cloud ML Engine Choosing the right compute options for your training job Choosing the hyperparameters for the training job

Monitoring your TensorFlow training model jobs Summary Chapter 9: Building Prediction Applications Overview of machine-based intelligent predictions Understanding the prediction process

Maintaining models and their versions Taking a deep dive into saved models

SignatureDef in the TensorFlow SavedModel TensorFlow SavedModel APIs

Deploying the models on GCP

Uploading saved models to a Google Cloud Storage bucket Testing machine learning models Deploying models and their version

Model training example Performing prediction with service endpoints Summary

196 196 197 198 198 198 208 210 215 219 220 221 221 223 223 224 226 229 231 233 234 235 235 236 239 244 246 250 252 255 256 257 264 278 279

Section 4: Section 4: Building Applications and Upcoming Features Chapter 10: Building an AI application A step-by-step approach to developing AI applications Problem classification Classification

[v]

281 281 282 283

Table of Contents

Regression Clustering Optimization Anomaly detection Ranking Data preparation

Data acquisition Data processing Problem modeling Validation and execution

Holdout Cross-validation Model evaluation parameters (metrics) Classification metrics

Model deployment

Overview of the use case – automated invoice processing (AIP) Designing AIP with AI platform tools on GCP Performing optical character recognition using the Vision API Storing the invoice with Cloud SQL Creating a Cloud SQL instance Setting up the database and tables Enabling the Cloud SQL API Enabling the Cloud Functions API Creating a Cloud Function Providing the Cloud SQL Admin role

Validating the invoice with Cloud Functions Scheduling the invoice for the payment queue (pub/sub) Notifying the vendor and AP team about the payment completion Creating conversational interface for AIP

Upcoming features Summary Other Books You May Enjoy

283 283 283 284 284 284 284 285 285 286 286 287 287 287 288 290 292 296 300 300 302 304 304 305 309 311 313 315 316 316 319

320

Index

323

[ vi ]

Preface We are at an interesting juncture in the journey of computation; the majority of enterprise workloads (databases, computation, and analytics) are moving over to the cloud. Cloud computing is making it feasible and easy for anyone to have access to high-end computing power. As a result, we are seeing a fundamental shift in the way computational resources are utilized by individuals and organizations of all sizes. The providers of cloud computing infrastructure have also enabled the Software as a Service (SaaS) paradigm. With this, various services related to storage, compute, machine learning, and visualization are available without having to perform any server management or administration. The fundamental shift we are seeing based on our experience is toward a serverless architecture. At this juncture, Google Cloud Platform (GCP) is a complete platform provided by Google based on years of innovation and research that contributed toward building the most powerful search engine available to date. The same technology stack is now made commercially available by Google and as a result, everyone has access to massive computational power. With the tools available on GCP, it is very easy to build complex data management pipelines as well as machine learning/AI workflows. This democratization brings enormous power to the development community in building cutting-edge, innovative, intelligent applications that complement human intelligence and increase human capabilities many times. This book is an attempt to make it easy to get started with building intelligent AI systems using GCP. We have taken a hands-on approach to explain various concepts and components available on GCP that facilitate AI development. We hope that this book will be a good starting point to begin exploring an exciting and ever-expanding world of computation to build AI-enabled applications on GCP.

Preface

Who this book is for This book is for software developers, technical leads, and architects who are planning to build AI applications with GCP. In addition to that, students and anyone who has a great idea for an AI application and wants to understand the available tools and techniques to quickly build prototypes and eventually production-grade applications will benefit from this book. This book is also useful for business analysts and anyone who understands the data landscape from the business perspective. Without a great deal of prior hands-on experience, it is possible to follow this book to build AI-enabled applications based on domain knowledge. We have attempted to provide the instructions in a step-by-step manner so the reader finds them easy to follow and implement.

What this book covers We have divided this book into four sections based on the logical grouping of the content covered. Section 1 provides the fundamental introduction to GCP and introduces the reader to various tools available on GCP. Chapter 1, Overview of AI on GCP, sets the context for serverless computation on the cloud

and introduces the reader to GCP.

Chapter 2, Computing and Processing Using GCP Components, introduces the reader to

various tools and applications available on GCP for end-to-end data management, which is essential for building AI applications on GCP. Chapter 3, Machine Learning Applications with XGBoost, shows how one of the most popular

machine learning algorithms, XGBoost, is utilized on GCP. The idea is to enable readers to understand that machine learning algorithms can be used on GCP without having to worry about the underlying infrastructure and computation resources. Chapter 4, Using Cloud AutoML, will help us take our first step toward the democratization

of machine learning. AutoML intends to provide machine learning as a service and makes it easy for anyone with a limited understanding of machine learning models and core implementation details to build applications with machine learning models. We will be introduced to natural language and vision interfaces using AutoML.

[2]

Preface Chapter 5, Building a Big Data Cloud Machine Learning Engine, will explore some of the

fundamentals of machine learning in the cloud. There is a paradigm shift with regard to machine learning models on the cloud. We need to understand various concepts of cloud computing and how storage and compute are leveraged to build and deploy models. This is an essential chapter to understand if we want to optimize the costs of training and running machine learning models on the cloud. We will take a look at various frameworks, such as Keras, and we'll see how to use them on GCP. Chapter 6, Smart Conversational Applications Using DialogFlow, discusses conversational

interfaces to machine intelligence, which is an essential component of overall AI capabilities. In this chapter, we will understand how to use DialogFlow to build conversational applications. DialogFlow provides an easy web and API interface for quickly building a conversational application. It is possible to provide human-like verbal communication once the model is trained for a large number of conversation paths. Chapter 7, Understanding Cloud TPUs, discusses Tensor Processing Units (TPUs), which are

the fundamental building blocks behind the machine learning models on GCP. In this chapter, we will introduce readers to TPUs and discuss their organization and significance for accelerating machine learning workflows on GCP. If we want to optimize performance and increase speed, it is imperative to utilize the strength of TPUs. Chapter 8, Implement TensorFlow Models Using Cloud ML Engine, will further explore ML

Engine and shows how to build TensorFlow models on ML Engine. We will take a stepwise approach to training and deploying the machine learning models on GCP. We will take a look at recommended best practices for building a machine learning pipeline on GCP. Chapter 9, Building Prediction Applications, explains the process of building prediction

applications on GCP. We begin by discussing the basics of the prediction process and take a step-by-step, hands-on approach to building a prediction application with GCP. We will train and deploy a model on the platform and utilize the API layer to interface with the deployed model. Chapter 10, Building an AI Application, will utilize various components of GCP to build an

end-to-end AI application. We will illustrate this with an example use case: automating an invoice processing workflow using the tools on GCP.

[3]

Preface

To get the most out of this book In order to quickly build AI applications on GCP, it is recommended to follow the chapters sequentially. Each chapter builds on top of the concepts covered in the prior chapters. We highly recommend readers to treat this book as a hands-on guide to the concepts and follow the steps by performing the exercises on GCP with an individual or organization account.

Download the example code files You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you. You can download the code files by following these steps: 1. 2. 3. 4.

Log in or register at www.packt.com. Select the Support tab. Click on Code Downloads. Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of: WinRAR/7-Zip for Windows Zipeg/iZip/UnRarX for Mac 7-Zip/PeaZip for Linux The code bundle for the book is also hosted on GitHub at https:/​/​github.​com/

PacktPublishing/​Hands-​On-​Artificial-​Intelligence-​on-​Google-​Cloud-​Platform. In

case there's an update to the code, it will be updated on the existing GitHub repository. We also have other code bundles from our rich catalog of books and videos available at https:/​/​github.​com/​PacktPublishing/​. Check them out!

Download the color images We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https:/​/​static.​packtcdn.​com/​downloads/ 9781789538465_​ColorImages.​pdf.

[4]

Preface

Conventions used There are a number of text conventions used throughout this book. CodeInText: Indicates code words in text, database table names, folder names, filenames,

file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Lead_Stage is the label on which we will do our predictions on the test data." A block of code is set as follows: if version_name: body['predictionInput']['versionName'] = version_id else: body['predictionInput']['modelName'] = model_id

Any command-line input or output is written as follows: curl -H "Authorization: Bearer $(gcloud auth application-default printaccess-token)" -H "Content-Type:application/json" https://automl.googleapis.com/v1beta1/model-name/modelEvaluations

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "In Advanced option, mark skip row as 1 if your dataset has a header." Warnings or important notes appear like this.

Tips and tricks appear like this.

Get in touch Feedback from our readers is always welcome. General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected]

[5]

Preface

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details. Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material. If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you! For more information about Packt, please visit packt.com.

[6]

1 Section 1: Basics of Google Cloud Platform In this section, we will get to grips with the fundamentals of serverless computing on Google Cloud Platform (GCP). We will also provide an overview of AI components available on GCP, and introduce you to various computing and processing options on GCP. This section comprises the following chapters: Chapter 1, Overview of AI and GCP Chapter 2, Computing and Processing Using GCP Components

1 Overview of AI and GCP At this juncture of the evolution of computational technology, we are constantly generating data from more and more devices and platforms. It is now clear that data is a key factor that can give a competitive advantage or disadvantage if not utilized properly. In the last few years, we have seen that the storage and processing of large volumes of data are possible with infrastructure available on the cloud. In this chapter, we will get introduced to the cloud-first strategy, which enterprises are adopting for performing advanced analytics on data. We will see some of the advantages of the Cloud First strategy, along with the anti-patterns. In the subsequent sections of this chapter, we will see a high-level overview of the Google Cloud Platform (GCP). We will cover the following topics in this chapter: Understanding the Cloud First strategy for advanced data analytics Google data centers Overview of GCP AI building blocks AI tools available on GCP

Overview of AI and GCP

Chapter 1

Understanding the Cloud First strategy for advanced data analytics 2018 has seen a major shift for organizations that have begun treating data as the most critical asset for staying relevant. There is a noticeable shift from "on-premise" deployments of big data-processing infrastructure over to the cloud. As distributed data management systems started to mature, businesses began to rely on distributed computing platforms for their analytical and operational needs. However, volumes of data and the possibilities of subsequent analytics are constantly growing, with the addition of new data sources. With this, it is evident that there is a need for quick experiments and scaling (up and down) the environment as required. Some advanced analytics workloads utilize machine learning algorithms that require a cluster of Graphics Processing Units (GPUs) in order to provide results within a reasonable time frame. The GPUs can also be procured on demand and released back to the resource pool seamlessly in a cloud environment. Some of the experiments quickly need to get into production in order to achieve maximum business impact. In terms of agility and rapid deployment, a cloud-based environment is ideal. Deployment on the cloud has been embraced by the industry. Enterprises are seeing the benefits of not only Infrastructure-asa-Service (IaaS) but Advanced-analytics-as-a-Service (AAaaS) on a cloud platform. There has been a rapid shift toward the Cloud First strategy since the beginning of 2018. Organizations that are just getting started on their data strategy are adopting the cloud as their first playground instead of investing in on-premise deployments. However, "cloudfirst" does not mean a "data-last" strategy. Data still plays a central role in the strategy to become fully data-driven. Let's look at some of the advantages of data management in the cloud compared to on-premise deployments.

[9]

Overview of AI and GCP

Chapter 1

Advantages of a Cloud First strategy Adopting cloud-first strategy has some advantages as follows: Minimal upfront cost: Storage and computation infrastructure as services are available around the clock, with virtually unlimited capacity and a minimal scaling-up cost. There is no need to procure any hardware or set up the applications and the software from scratch. Moreover, there is a virtual spin-up of infrastructure as well as software on cloud platforms. This results in a minimal upfront cost and investment for enterprises. This model is especially suitable for developing prototypes and testing the proofs of concepts. In certain cases, the business viability of a concept can be quickly validated instead of committing to capital expenses and other overhead costs. With the cloud, innovative ideas and concepts can be quickly tested and deployed for immediate business benefits, with minimal cost pressure. Flexible capacity: Pay-as-you-go is the core principle with which cloud services are set up. Although careful capacity planning is important for any successful data strategy, with the cloud we can have some flexibility in terms of capacity planning. If more capacity is required (for example, an online retailer can expect high sales volume during Black Friday), a company can scale up the capacity just for a small time span, and then scale down to the routine capacity. This type of flexibility is not feasible in an on-premise deployment of data management systems. Global connectivity: Infrastructure, platforms, and applications as services that are available on the cloud can be accessed virtually across the globe as long as internet connectivity, along with the appropriate authentication/authorization, is available. Connectivity is ensured, with implicit redundancy from the cloud providers across regions and physical locations. Internal deployment and topology is not a consideration or area of focus for clients. The cloud endpoints are consistent and seamless, irrespective of the location of the client. Seamless upgrades: The operations systems and application software that is provisioned on the cloud can be seamlessly upgraded by cloud providers. This ensures consistency and reliability among all deployments and works well for internet-scale enterprise applications. Compare this with traditional on-premise deployments, where the application versions and patches need to be carefully applied to all the nodes while managing service downtime as well as business disruptions. In a cloud environment, the responsibility is entirely passed on to the cloud provider, and enterprises can focus on the core business applications.

[ 10 ]

Overview of AI and GCP

Chapter 1

Serverless DevOps: With this, application development teams can fully focus on the core business logic. They can simply prescribe the requirements for storage and computation based on the application scope, in terms of data volume and computation requirements. These teams do not need to worry about any deployments, and they will start using the services with minimum configuration and scaling-up time. A serverless architecture exposes cloud functions as services where the applications are hosted by the cloud provider, along with the responsibility of the management of hardware and related software components. This pattern provides a jump-start into business applications, algorithms, and model development without having to worry about the underlying infrastructure and, therefore, the core functionality can be built quickly in order to start realizing business benefits. Fast time-to-market (TTM): With all the benefits listed previously, the TTM for various concepts and prototypes is reduced to a minimum with a Cloud First strategy. Google has taken an innovative approach in making cloud services available to enterprises by building services from the ground up. These services are originally built for internal consumption by Google itself for its search and other internet-scale services. The platform is quickly maturing to a complete suite for the development of applications across the simplicity spectrum, starting with simple web applications and going on to microservices and advanced analytics, which leverages extremely large volumes of structured and unstructured data and the computation power of GPUs and Tensor Processing Units (TPUs) for training compute-intensive models. In Chapter 7, Understanding Cloud TPUs of this book, we will deep dive into TPUs and understand TPUs in detail. In this book, we are going to understand various components of GCP in detail, and will specifically look at utilizing GCP for deployment of Artificial Intelligence (AI) workloads and seamless integration of various applications as services. While there are proven benefits and advantages of a Cloud First strategy in realizing tangible benefits, cloud adoption is also criticized, for the reasons stated next.

[ 11 ]

Overview of AI and GCP

Chapter 1

Anti-patterns of the Cloud First strategy While cloud computing is a new paradigm, there are certain fundamental assumptions and requirements of consistent connectivity and security which need to be addressed. Here are some of the anti-patterns for the Cloud first Strategy: Downtime: Cloud services are fully dependent on the availability of reliable internet connectivity. When business-critical applications are deployed on the cloud, the risks and impact of internet downtime increases. However, the risks of downtime are equally prevalent with on-premise deployments, and architectural patterns need to be carefully considered for minimizing these risks. With the cloud, services should be designed with high availability, redundancy, and disaster recovery in mind. Cloud vendors provide multi-availability zones for the cloud infrastructure. The applications should leverage these redundancy zones for deployments of critical services. In the next section, we will understand how Google mitigates the downtime risks with geographically distributed data centers. Security and privacy: A lot of enterprises that manage sensitive data assets are concerned about the security and privacy of data when it comes to cloud adoption. The cloud is a shared space, and hence, the risk is evident and obvious. However, the following strategies can easily counter the anti-pattern: Implementation of security governance practices and processes at all levels of deployment. Implementation of carefully defined access-control levels and providing a minimal level of access to all users and processes. When in doubt, provide more restricted access than wider net access to services and infrastructure. Implementation of multi-factor authentication for all user accounts. Deployment of anomaly detection procedures and constant monitoring (automated as well as sample-based manual monitoring) of the cloud infrastructure at all endpoints. Cloud service providers have made significant investments in taking care of these antipatterns, and cloud deployments are as reliable and secure as on-premise deployments at this point in time. In the next section, we will look at the current state of Google's data centers in terms of geographical zones and service availability.

[ 12 ]

Overview of AI and GCP

Chapter 1

Google data centers As a custodian of data on an internet scale, Google has established a sophisticated network of data centers. The same infrastructure is now made available for GCP. At the time of this writing, GCP has 18 global regions that are divided into 55 zones, spread across 35 countries. The following screenshot shows GCP regions across the globe:

[ 13 ]

Overview of AI and GCP

Chapter 1

The regions are divided into "Americas" (North America and South America), "Europe", and "Asia Pacific". The selection of the correct region and zone is crucial for ensuring an acceptable level of latency, availability, and durability of the services and data. The roundtrip latency for the network packets between locations within a region is ensured at the submillisecond level at 95 percent. A service deployed on a single zone within a region will not be fault-tolerant; a zone failure will cause service disruption in such cases. In order to develop a fault-tolerant application, this needs to be deployed across zones; and for mission-critical applications, a multi-region deployment strategy should be implemented. Based on the critical level of the applications and services, the resources are categorized as follows: Zonal resources: These resources operate within a zone. In this case, if the zone is unavailable, the service becomes inaccessible. The compute engine is a zonal resource since the computation can be reinstated if the underlying data is available. If a zone that contains the compute engine instance goes down, the computation needs to be restarted once the zone becomes available. Regional resources: These resources are deployed within a region, with redundancy across various zones. The services catered for by these resources are not disrupted due to a zone failure. As a result, a higher level of availability is ensured with regional resources. Multi-region resources: Some of the mission-critical services are deployed with redundancy across regions. Due to geographical separation between regions, the services need to align with the trade-offs between latency and consistency. These trade-offs are chosen based on the service and the acceptable service-level agreements (SLAs). The data within the services deployed on multi-region resources does not belong to a specific region and maintains fluidity. The data can be transferred across regions in order to meet service levels. Considering the existing available capacity and projections in demand for cloud computing resources, Google is already expanding its footprint across geographies. With the use of machine learning algorithms, it has ensured that the capacity is optimally utilized. From an AI practitioner's perspective, GCP provides a reliable platform so that we can focus on building great applications by leveraging a serverless architectural paradigm. In the next section, we will look at various components of GCP and get familiar with the interface. This is a basic introduction to the platform within the context of this book.

[ 14 ]

Overview of AI and GCP

Chapter 1

Overview of GCP Although any cloud platform is available as a virtual service over the network, there are systems and physical resources at the core of it. As discussed in the previous section, GCP has created a global network of data centers that provide redundancy and—hence—reliability of services across regions and zones. One of the advantages of choosing a region closer to the client location is lower latency, which plays a significant role when we are dealing with large volumes of data transfers. These use cases require minimum latency between event time and processing time. All the resources on GCP, such as storage and compute, are treated as services. The paradigm is referred to as Everythingas-a-Service (XaaS). This includes IaaS, Platform-as-a-Service (PaaS), and so on. The resources and services on GCP are categorized as global, regional, and zonal, depending on the level of abstraction and applicability. Any resource that is governed by an organization on GCP needs to be part of a project. A project is the top-level abstraction of all the resources provisioned by the organization. A project has various attributes, metadata, resources, and access controls. The resources within a project boundary are connected to each other according to regional and zonal restrictions and communicate with an internal network. However, the resources across projects may communicate over an external network. In GCP, a project is uniquely identified by project name, project ID, and project number. GCP provides a web interface console, command-line interface (CLI), and Cloud Shell to interact with various services. Here is a screenshot of the GCP console:

[ 15 ]

Overview of AI and GCP

Chapter 1

The cloud Service Development Kit (SDK) provides a CLI tool called gcloud that can be used for performing all the configurations and interacting with the platform. It can also be used for management of the development workflow. A similar interface for interacting with GCP is provided through Cloud Shell, which is a temporary browser-based shell environment that can be accessed from within the cloud console.

AI building blocks In order to build an intelligent machine or system, there are some fundamental building blocks that are required. "Data" is at the core of AI development and large-scale adoption. In this section, we will review all the building blocks that surround data. Here is a diagram that shows all the building blocks of AI:

[ 16 ]

Overview of AI and GCP

Chapter 1

Data The general concept of intelligent machines that can match human capabilities was proposed about a century ago, and a considerable amount of thinking and research was devoted to creating these intelligent machines. These efforts laid the foundation for modern-day AI. However, research was limited by the data storage and processing capabilities available, since the machine learning models that form the basis for AI required large amounts of data for training and a large amount of processing power for algorithmic calculations. The field of AI research is now reinstated due to the volumes of data we can store and process. Data is the central piece and focal point of the building blocks of AI. Data capacity is categorized into three areas: storage, processing, and actions driven by data.

Storage Starting around the turn of the century, we have seen a steep rise in digitization and the capacity for storing digital assets. The accessibility of storage capacity for general use has also seen a considerable increase due to the adoption of cloud computing. In the cloud computing paradigm, storage is available as a service and does not require procurement and management of infrastructure related to storage. As a result, the availability of data assets for AI research has exponentially increased. The following diagram illustrates a growing storage capacity over a period of time. The requirement of the store is growing exponentially and beyond our current limits. This is happening very rapidly. In order to insist on the velocity of data volume growth, we have depicted an out-of-bound scale:

[ 17 ]

Overview of AI and GCP

Chapter 1

Processing We have also seen a rise in overall processing power due to distributed computing frameworks. Processing units are distributed across various machines for parallel processing and computation. The frameworks are responsible for keeping a track of computation across nodes and consolidating the results that are derived from actionable insights. The processing power also increases considerably due to the cloud computing paradigm, where compute is available as a service; and in this case, there is no need for procurement and management of a specialized infrastructure for large-scale data processing. With the added processing capacity, we are now able to experiment with machine learning algorithms by leveraging large volumes of data. This is, in turn, fueling rapid advancements in the field of AI.

Actions Due to the availability of large volumes of data and the processing power to churn this data to derive meaningful insights, we are able to predict future events and actions based on probabilistic modeling. These actions are based on data instead of subjective judgment. One of the fundamental building blocks for intelligent machines is the capability to take action based on the environmental context. The action should result in maximum rewards for the agent. Actions based on data instead of subjective human judgment need to be facilitated by the AI systems in order to fully augment human capabilities.

Natural language processing One of the key components of any AI system is the Natural Language Processing (NLP) interface to intelligent machines and agents. Interaction with an AI-enabled system needs to be in the natural manner in which humans interact with each other. Language processing is complex due to the level of variation and fuzziness with which we communicate. The synthesis and analysis of human speech can be achieved by training deep neural networks (DNNs) with large volumes of training data. Despite there being large volumes of data in DNNs, it is difficult to get to the same level of accuracy due to semantic variation in the languages. However, NLP is a fundamental building block for AI. Later in this book, we are going to look at various options and tools available on GCP for NLP.

[ 18 ]

Overview of AI and GCP

Chapter 1

Speech recognition In order for the interface to the intelligent machine to be as close to human interaction as possible, we need speech-recognition capabilities. Instructions need to be given as voice commands, and we require a sophisticated speech-recognition engine in order to convert spoken words into a machine-readable format. Once again, due to the variation in the way a specific word is spoken by various people, it is difficult to get 100% accuracy with speechrecognition systems. The interface needs to be calibrated and trained for a specific individual, and a generic model needs to be enhanced continuously in order to improve the overall efficiency of speech-recognition engines.

Machine vision In order for an AI system to fully augment human capabilities, it needs to develop a way to gather an understanding of the environmental context by building capabilities for processing visual information in a similar manner to the human eye. The cameras capture the visuals and the models need to be trained with large volumes of video data in order to develop an accurate understanding of the environment. Machine vision is a key ingredient of AI. In the subsequent chapters, we will explore machine vision APIs, along with example code in GCP.

Information processing and reasoning Humans are good at processing information gathered by various senses and applying reasoning to respond to these senses in a meaningful way. An intelligent machine requires similar capabilities in order to mimic and augment human capabilities. In the case of AI systems, the reasoning can be based on large volumes of training data, as well as a reinforcement learning engine. Logical reasoning based on the environmental context of the information is a key ingredient of an AI system.

[ 19 ]

Overview of AI and GCP

Chapter 1

Planning and exploring One of the most evolved functions the human brain can perform with ease is planning for events and actions ahead of time. This skill requires the exploration of past data along with the current contextual data and real-time information. The actions are planned based on short-term and long-term goals that are kept in mind. An intelligent agent needs the capability to explore contextual environmental data and plan based on past available data. Navigating around geographical maps is a good example of planning and exploring the function of AI. The mapping application can suggest the best route for a particular moment based on real-time data exploration, and it can align the route plan based on new information that is encountered along the way.

Handling and control Some intelligent agents also need to be capable of handling and controlling physical objects. This includes industrial robots that handle various machine parts on the assembly line, place them at the right location, and apply them based on a predefined routine. This type of system needs to have a level of fuzziness as well as a self-learning loop that can act according to the environment. It is not possible to program the handling and controlling function for all possible situations; the model needs to be built in such a way that the agent can act in accordance with changes to the environmental state, while still maximizing the overall reward for the agent within the environment.

Navigation and movement An intelligent agent needs to be able to navigate and move through a physical environment. Self-driving cars or autonomous vehicles are examples of this capability of AI systems. In order to build this capability, the agent needs to be trained extensively on the road conditions that will be encountered in real life. A DNN also needs to be trained in as many scenarios as possible. The model execution in the real (non-training) environment needs to be extremely efficient in order for the agent to survive in a mission-critical environment that requires ultra-low latency between event time and action time for the agent.

[ 20 ]

Overview of AI and GCP

Chapter 1

Speech generation In order for an intelligent agent to interact in a natural form, it needs to be capable of generating human speech. Voice-enabled systems are now mainstream and comparatively easier to build than speech-to-text interfaces. Google provides simple-to-use APIs for speech generation. We are going to take a look at these APIs and services in the next section of this chapter, exploring detailed examples in the later chapters of this book.

Image generation Image-generation capability is another building block of AI systems. Images are processed and reformatted in order to get more meaning and information out of the pixel data. Imagegeneration capabilities are used in medical image processing as well as for high-end forensic studies. In the next section, we will take a look at the tools available in GCP that facilitate the various building blocks of AI.

AI tools available on GCP Google has made it simple to build AI systems with ready-to-use AI building blocks on GCP. There are three categories of components and tools available on GCP: Sight, Language, and Conversation.

Sight Sight refers to the visual interface for intelligent machines. GCP provides the following APIs for visual information and intelligence: Cloud Vision API: This is a Representational State Transfer (REST) API abstraction found on top of pre-trained models on GCP. The API can classify images into generic categories as well as specific objects. It can also read text within images. The image metadata management, along with the moderation of unwanted content for a specific application, is provided out of the box with the Cloud Vision API. This is very easy and seamless for gathering insights from images. Some common use cases of this API are image search, document classification, and product search (retail). The following diagram shows various applications and use cases for the Cloud Vision API:

[ 21 ]

Overview of AI and GCP

Chapter 1

Cloud Video Intelligence API: This is a REST API that can extract information from video feeds and enable searching and extraction of metadata from video data. The API is easy to use and contains a list of more than 20,000 predefined labels. The API also provides interoperability between video tags and contents, enabling a text-based search across video assets when those are stored in Google Cloud Storage. The following diagram shows various applications and use cases for the Cloud Video Intelligence API:

[ 22 ]

Overview of AI and GCP

Chapter 1

AutoML Vision: This service makes it possible to custom train models for classifying visual images. The models can be trained and evaluated with an easyto-use interface. They can also be registered with a unique namespace in order to use them through the AutoML API. If the user has a large number of images to be labeled, there is a human labeling service that complements the AutoML Vision API. Human labeling can be initiated directly from the AutoML Vision user interface.

Language The GCP provides APIs for linguistic information and intelligence with the Translation and Natural Language APIs, as follows: Cloud Translation API: This API offers bidirectional translation between two languages based on pre-trained models as well as custom models that can be trained using the AutoML translation framework. The API also facilitates language detection when the language of the source text is unknown. Similar to the other AI services, translation services are also available as a REST API for programmatic access and integration within applications. At the time of writing, there is support available for 100 languages. A unique feature for translating the HTML content without a requirement for explicit parsing makes it easy to provide web page translations and create multilingual sites and applications. Cloud Natural Language API: This API provides insights into unstructured text and audio data based on pre-trained models, as well as custom models that can be trained using the AutoML Natural Language framework. The API can gather information about people, places, events, sentiments, and so on based on various forms of unstructured text. Internally, the service leverages a rich ontological graph and a constantly evolving model for improved accuracy. Some of the common use cases that are possible with this API are customer sentiment analysis and product classification (retail market research). The simple-to-use REST API facilitates syntax analysis, entity recognition, sentiment analysis, and content classification, and supports multiple languages. GCP provides APIs for enabling a vocal and conversational interface with the intelligent machine with Dialogflow and Google Cloud Text-to-Speech/Speech-to-Text APIs.

[ 23 ]

Overview of AI and GCP

Chapter 1

Conversation The conversational interface is an essential aspect of an AI-enabled application. GCP provides the Dialogflow engine for creating an enterprise-grade conversational application via a simple interface as well as an API, as follows: Dialogflow Enterprise Edition: This service facilitates the creation of conversational interfaces for applications using underlying deep learning models that are pre-trained and ready to be used. The conversation service can be utilized for a natural interface for users with websites, mobile applications, and even Internet of Things (IoT) devices. At the time that this book is being written, the service supports conversational interfaces in 20 languages. Dialogflow has seamless integration with the Natural Language API in order to perform sentiment analysis while a live conversation is ongoing. This helps with providing customer-specific and targeted services to customers. Some of the use cases that are possible with this interface include improvement in enterprise productivity, self-service business transactions for end customers, and enabling a natural language communication with IoT devices. Cloud Text-to-Speech API: This API facilitates synthesizing human speech from the input text. The service is available in multiple languages and variations, enabling the creation of a natural language interface for applications. The machine learning models responsible for text-to-speech conversion are pretrained and constantly evolving for improved accuracy and fidelity, as close to the natural human voice as possible. Some of the common use cases that can be implemented with the Text-to-Speech API include call center automation, interactions with IoT devices, and transformation of text into audio for readers. Cloud Speech-to-Text API: This API is based on powerful models that are pretrained for the conversion of audio inputs to text in multiple languages. The API supports real-time streaming or pre-recorded audio inputs. It is also capable of detecting language automatically and supports conversion to text in real time for short- and long-form audio clips. As this book is being written, there are four categories of pre-trained models for the Speech-to-Text interface. These models are suitable for specific use cases and conversational interfaces: command_and_search: This can be used for short commands and voice search. phone_call: This is used for audio originated from a phone conversation. video: This is suitable for audio that is part of a video signal or recorded at higher sample rates. default: This is a generic model.

[ 24 ]

Overview of AI and GCP

Chapter 1

All services and APIs listed in this section enable a natural interface with intelligent machines and are the building blocks for AI. In this book, we will be exploring these AI tools in detail, with code examples.

Summary In this chapter, we have seen the Cloud First strategy and how this is an imperative choice for the development of modern applications that leverage AI. In addition, we have briefly seen the anti-patterns of the cloud-based implementations of various systems. We have also introduced GCP, which is built based on Google's experience with the management of large volumes of data. This platform is being enriched and expanded with time. We have studied various building blocks of generic AI systems and reviewed the tools available on GCP that facilitate the development of AI applications seamlessly, and in a serverless manner. In the next chapter, we are going to look at various components that are available on GCP for computing and processing data assets on GCP.

[ 25 ]

2 Computing and Processing Using GCP Components Before building and running an Artificial Intelligence (AI) application in the cloud, it is very important to know the different options available. This will help us to choose the right option for our application, making sure that we get optimized performance in a costeffective way. In this chapter, we will dive deep into the options available for building and running our AI applications on Google Cloud Platform (GCP). We'll look at the compute, processing, and storage options available on GCP, along with orchestration and visualization. Once we have a good understanding of these options, we will go over a hands-on example at the end of the chapter. In this chapter, we will look at the following topics: Understanding the compute options Diving into the storage options Understanding the processing options Creating an example of building a Machine Learning (ML) pipeline

Computing and Processing Using GCP Components

Chapter 2

Understanding the compute options GCP provides various compute options to deploy your application and to use the actual Google infrastructure to run your applications. The options available are as follows: Infrastructure as a Service (IaaS) Containers Platform as a Service (PaaS) All compute options communicate with other GCP services and products, like storage, networking, Stackdriver, security, and the big data product suite. Based on a given application's need, an appropriate compute option is selected from the Compute Engine, Kubernetes Engine, App Engine, and Cloud Functions. Google compute options help you to run virtual machines of multiple sizes on Google infrastructure and to customize them. It enables you to run containerized applications, and you can directly deploy your code on the engines if you don't have to take care of infrastructure-related items. So based on your needs, Google provides multiple options to use compute power, expedite the development effort, and reduce the time to market. Next, we will discuss the following compute options in detail: Compute Engine App Engine Cloud Functions Kubernetes Engine

Compute Engine Compute Engine is the IaaS provided by Google Cloud; it is a virtual machine running in the Google infrastructure. Compute Engine is available across all the zones and regions provided by Google Cloud. It comes with a storage option of persistent disk and local Solid-State Drives (SSDs).SSDs are internally built with the integrated circuits on chips, and do not contain any spinning heads or disk drives for reading data. The SSDs are more durable and provide faster read times compared to hard disk drives. A persistent disk is a network storage that can be extended up to 64 TB, while the local SSD is the encrypted drive, which is actually attached to the server and can extend up to 3 TB.

[ 27 ]

Computing and Processing Using GCP Components

Chapter 2

While spinning up the instance, the user can either select one of the predefined compute options or go with a customized configuration. Compute Engine can be launched with either a Linux or Windows operating system. These instances can be launched with a CPU, GPU, and TPU, and since the infrastructure is provided by Google, users can do operating system-level customization. Users can create managed and unmanaged instance groups in Compute Engine: A managed instance group will always contain identical virtual machines and support auto scaling, high availability, rolling updates, and more. An unmanaged instance group can contain machines with different configurations. Users can use instance templates while creating a managed instance group but not with an unmanaged instance group. It is suggested to select a managed and uniform instance group until there is a very specific need for machines of different configurations in the same pool. Let's quickly talk about an option that will help to reduce the pricing. If it is possible, use preemptible machines. The preemptible virtual machines are short-lived and low-cost options that can be utilized when the workloads are known and expected to finish within 24 hours. These virtual machines provide a significant cost advantage and result in a cost saving of up to 80% compared with regular instances. Preemptible machines will save up to 80% of the cost but there is a catch: Google can always take that instance back from you with 30 seconds, notice. Google charges per second and gives sustainable user discounts.

Compute Engine and AI applications While training models for AI (ML) applications, there is always a need for powerful machines to increase the efficiency of the model by providing ample training data and reducing the time to train the model. Google Compute Engine has multiple options to spin up powerful compute instances and groups that can train and run the model over it. For training and running models, you should use the power of CPUs and GPUs. For TensorFlow applications, there are machines with TPUs that should be used.

[ 28 ]

Computing and Processing Using GCP Components

Chapter 2

App Engine App Engine is the PaaS provided by Google Cloud; it is a fully managed serverless application platform. App Engine is available in most of the regions covered by Google Cloud. You can consider App Engine a deployment-ready infrastructure; the developer just has to focus on building the application and deploying it on App Engine, and everything else will be taken care of. App Engine comes with great features like auto-scaling, traffic splitting, application security, monitoring, and debugging—all these features are essential for deploying, securing, and scaling any application. Using tools like Cloud SDK and IntelliJ IDEA, developers can directly connect to App Engine and perform operations like debugging source code and running the API backend. One limitation of App Engine is that its operating system cannot be customized. App Engine comes in two different environments: Standard Flexible The App Engine standard environment applications run in a sandbox environment and have support for running Python, Java, Node.js, Go, and PHP applications. On the other hand, the App Engine flexible environment applications run in Docker containers on a Google Compute Engine virtual machine and support running Ruby and .NET applications in addition to the languages supported by the standard environment. For more details on choosing between the standard and flexible environment, please refer to: https:/​/​cloud. google.​com/​appengine/​docs/​the-​appengine-​environments.​

App Engine is very useful for deploying any web or mobile application. Based on usage of resources, infrastructure will autoscale and Google will only charge for what applications has used.

App Engine and AI applications While running any mobile or web application on App Engine, there are a lots of use cases where there is a need for AI in those applications. These can be fulfilled while deploying your application in App Engine. The service can be deployed with cloud endpoints, and a Python application can be deployed in App Engine, which loads trained machine learning models. Once the models are accessible via App Engine, the service can send requests to Python application and get responses in a consistent manner.

[ 29 ]

Computing and Processing Using GCP Components

Chapter 2

Cloud Functions Cloud Functions is an event-driven serverless PaaS, provided by Google Cloud, and fits well within the microservice architecture. Cloud Functions is available in most of the regions covered by Google Cloud. They are mostly used for small or single-purpose functions, like invoking other services or writing an event to a pub/sub topic, and so on. There are some great features in Cloud Functions that provide agility and zero downtime maintenance. Cloud Functions can autosale, and they are highly available. You can connect to most of the Google Cloud services using Cloud Functions. Cloud Functions can be developed in JavaScript or Python. The user has to pay for Cloud Functions only when they are running. This makes it very cost-effective.

Cloud Functions and AI applications While running any application, if a user wishes to invoke an API for Cloud ML or Cloud Vision based on a specific event, Cloud Functions are the way to go forward.

Kubernetes Engine Kubernetes Engine is a managed service provided by Google Cloud; it is used to deploy and run containerized applications. The following are the features of Kubernetes Engine: It is available across all the zones and regions provided by Google Cloud. Beneath the Kubernetes cluster, Google is actually running Compute Engine, so most of the advantages we have with Compute Engine will be available with the Kubernetes Engine, along with the additional services provided by it. In Kubernetes cluster, virtual machines with customized OS images can be used and the cluster will autoscale the customized images. Kubernetes clusters are highly secure and backed by HIPAA and PCI DSS 3.1 compliance. It comes with a dashboard service allowing the user to manage it. It auto-upgrades and auto-repairs itself. It supports common Docker images and a private container registry through which users can access private Docker images. Kubernetes clusters can be integrated with Stackdriver to enable monitoring and logging of the cluster.

[ 30 ]

Computing and Processing Using GCP Components

Chapter 2

If your application can manage scenarios where one of the virtual machines goes down, then it makes a lot of sense to use preemptible instances with a Kubernetes cluster as it will save a lot of costs.

Kubernetes Engine and AI applications While training models for Al (ML) applications, there is always a need for powerful machines to increase the efficiency of the model by providing ample training data and reducing the time to train the model. Kubernetes cluster can be built with GPUs to train the model and run the ML workloads. This can benefit a lot of machine learning applications, which require managed containerized clusters with powerful GPU machines.

Diving into the storage options GCP provides various storage options to store your application data. Different applications have different storage needs, and depending on the application, performance improves. Looking at the GCP storage options, it is very clear that it can support various storage needs like NoSQL, Document DB, Object Storage, Relational Database Management System (RDBMS), and so on. You can use Google's managed services for your storage needs, or you can use the Google infrastructure and install services that you need. Choosing the right storage option for your application is important. Based on the available storage options in Google, the following chart will help you to identify the right storage option:

Next, we will discuss the following storage options in detail: Cloud Storage Cloud Bigtable Cloud Datastore Cloud Firestore Cloud SQL

[ 31 ]

Computing and Processing Using GCP Components

Chapter 2

Cloud Spanner Cloud Memorystore Cloud Filestore

Cloud Storage Cloud Storage is the object storage provided by GCP. Here are the features of cloud storage: It can store any amount of data and various formats of data, including structured data, unstructured data, video files, images, binary data, and so on. Users can store data in Cloud Storage in four different buckets, namely, MultiRegional, Regional, Nearline, and Coldline, depending on the following requirements: If data is frequently accessed across the world then go for the Multi-Regional bucket. If data is frequently accessed in the same geographical region then go for the Regional bucket. For data accessed once a month, go for Nearline, and for data that's accessed once a year, go for the Coldline bucket. Bucket choice is important because of the cost associated with it. Data stored in Cloud Storage is highly available and encrypted by default. If you want to customize default encryption you can easily do that. Cloud Storage provides APIs and tools to transfer data in and out. Users can transfer data from on-premise using the gsutil tool, and can transfer from other clouds using the cloud services. All data transfer is safe and encrypted in flight. There are features such as object life cycle management for moving the data to cheap, less frequently used storage, and users can use access control lists (ACLs) to enable security for data access. This is a centralized service and is integrated with all the compute and processing options. Data stored in Cloud Storage can be accessed by services such as BigQuery and Dataproc to create a table and use it in processing. With all its features, Cloud Storage is the most frequently used storage option on GCP and one of the cheapest as well. Its pricing varies from $0.007 per GB a month to $0.036 per GB a month, based on the storage class and access pattern.

[ 32 ]

Computing and Processing Using GCP Components

Chapter 2

Cloud Storage and AI applications Cloud Storage can help in various AI and ML use cases. Most of the big data migration or modern data platforms use Cloud Bigtable to build their NoSQL database. For example, a Spark ML application will access data from Cloud Bigtable and store results in it. Cloud Storage is already being used for use cases such as genomics, video transcoding, data analytics, and compute.

Cloud Bigtable Cloud Bigtable is a fully managed NoSQL database system provided by GCP. It can scale up to petabytes of data with very low latency and high throughput. The features of Cloud Bigtable are as follows: Cloud Bigtable is best suited for use cases where there are very heavy read and write operations; having said that, Bigtable can be used for both streaming and batch operations. Each table in Bigtable contains a one-column family, and each column family has multiple column qualifiers. At any given point in time, column qualifiers can be added to the column family. Data is stored in the table as key-value pairs. The most important thing to take care of while designing a table in Bigtable is the row-key column. Based on this column only, data will be evenly distributed through the table, and the user will get optimized performance while reading the data. In case the row-key column has data skewed, then hot-spotting will occur. For example, if the row-key is a city in the table and data is skewed for one city only, then the data will not be distributed evenly, and the read will not be optimized. On the other hand, if the application is receiving data for multiple cities evenly, then the data will be fairly distributed, and the read operations will be optimized. An update operation can be performed at row level in Bigtable. Google Cloud Bigtable is integrated with services like Stackdriver, Cloud Dataflow, Dataproc, and Hadoop. It also supports industry-standard HBase APIs. All data stored in Bigtable is encrypted by default, and access controls are available to give appropriate access to users. For each node of Bigtable, the user will be charged $0.65 an hour, for SSD storage $0.17 per GB an hour, and for HDD storage $0.026 per GB an hour.

[ 33 ]

Computing and Processing Using GCP Components

Chapter 2

Cloud Bigtable and AI applications Cloud Bigtable can act as storage in various AI and ML use cases. Most of the big data migration or modern data platforms use cloud Bigtable to build their NoSQL Database. For example, a streaming ML application can use Bigtable very well as a backend.

Cloud Datastore Cloud Datastore is the fully managed, scalable NoSQL database provided by GCP. Datastore is built on top of Bigtable; it is the reason why it is highly scalable. The features of Cloud Datastore are as follows: Cloud Datastore offers some important RDBMS features, such as ACID support, queries similar to SQL, and so on. To easily work with Datastore, you should understand basic terms and its similarity to RDBMS. Here is a comparison chart of RDBMS and Datastore terms: Object category Single object Unique identifier Individual data

Datastore Kind Entity Key Property

RDBMS Table Row Primary Key Field

Why do we need Datastore if it is so similar to RDBMS? The simple reason is because of its scalability, which RDBMS cannot achieve.

Google Cloud Datastore is integrated with services such as Stackdriver, App Engine, and Compute Engine. It is highly available and comes with an admin dashboard. All data stored in Datastore is encrypted by default and access controls are available to give appropriate access to users. For 1 GB of storage, the user is charged $0.18, for writing 100,000 entities $0.18, for reading 100,000 entities $0.06, and for deleting the same amount $0.02.

[ 34 ]

Computing and Processing Using GCP Components

Chapter 2

Cloud Datastore and AI applications Cloud Datastore can act as storage in AI and ML use cases for large web applications. Any e-commerce website hosted on GCP can use Datastore to save data, and using this data, ML models can be trained and can provide required recommendations to the user and in turn can increase customer satisfaction.

Cloud Firestore Cloud Firestore is the scalable NoSQL document database. It is a suitable database for web, server, and mobile developments from Firebase. Data stored in Firestore is synced in near real-time globally and can be accessed from multiple devices. Firestore stores data in documents and collections. Let's quickly see an example of how data is stored: Employee is the collection, which shall contain all the documents. Anand and Vikram are documents that represent Employees, and inside documents, there are fields that map to values. Here is the structure of the collection: Employee -- Collection Anand -- Document Name: Anand Deshpande Department: IT Designation: Vice President Vikram -- Document Name: Vikram Chaudhari Department: IT Designation: Director Followed by a screenshot of the Firestore user interface in GCP:

[ 35 ]

Computing and Processing Using GCP Components

Chapter 2

Documents are uniquely identified by their location in the database. All data stored in Firestore is encrypted by default, and access controls are available to give appropriate access to users. For 1 GB of storage, the user will be charged $0.18, for writing 100,000 documents $0.18, for reading 100,000 documents $0.06, and for deleting same the same number of documents $0.02.

Cloud Firestore and AI applications Cloud Firestore can act as storage in AI and ML use cases for applications that are hosted on mobile and web devices. Any application hosted on GCP with both website and mobile applications can save the data in Firestore, and using this data, ML models can be trained and can provide required recommendations to users on both their mobile devices and website applications.

Cloud SQL Cloud SQL is the fully managed instance for MySQL and PostgreSQL. The features of Cloud SQL are as follows: It is scalable and the right choice for up to 10 TB of data. Its instance can be created instantly within no time and in the region suitable for the application. The instances that can be created range from 1 virtual CPU (VCPU) to 64 VCPUs and 600 MB RAM to 400 GB RAM. Regarding persistence storage, users have a choice of either SSD or HDD storage. An SSD, being faster, costs more than an HDD. Automatic persistence storage can be enabled to make sure the database shall not run out of space. High availability is optional, and it is up to the user to enable it.

[ 36 ]

Computing and Processing Using GCP Components

Chapter 2

Cloud SQL instance can be accessed from most of the GCP compute and processing services. One of the quickest ways to access it is by using the Google Cloud Shell. All updates and patches to the instance are automatic, and users don't have to worry about it. Cloud SQL is highly secure, and all the data stored in it is encrypted by default. Applications accessing Cloud SQL from outside GCP have to go through robust security layers to get access to it. Cloud SQL is cheap and comes with a lot of sustained user discount pricing for instances. Pricing ranges from $0.0150–$8.0480 per hour, based on the type of instance. Persistence storage is $0.17 for SSD, $0.09 for HDD, and $0.08 for backups.

Cloud SQL and AI applications Cloud SQL can serve all the AI and ML use cases for large and complex structured data. Another service named Cloud Spanner can serve similar use cases, which can be served by Cloud SQL, but on a very large scale.

Cloud Spanner Cloud Spanner is the fully managed, horizontally scalable, relational database management service. It can scale up to millions of nodes across all the available regions. The features of Cloud Spanner are as follows: Its instance can be created instantly within no time and in single or multiple regions as suitable for the application. Cloud Spanner instances can be created from one node to millions of nodes. Even though Cloud Spanner is a distributed database, it still supports ACID transactions and its availability is 99.999%. Cloud Spanner has features such as RDBMS and scalability like a distributed database. Instance of Cloud Spanner can be accessed from most of the GCP compute and processing services and from outside GCP as well, with the right set of permissions. One of the quickest ways is to quickly access it by using the Google Cloud Shell. All updates and patches to an instance are automatic and users don't have to concern themselves with it.

[ 37 ]

Computing and Processing Using GCP Components

Chapter 2

Cloud Spanner is highly secure, and all the data stored is encrypted by default. It is integrated with identity and access management, and provides features such as auditing and logging. It offers two levels of pricing, one for a regional setup and one for a multiregional setup. Per-node price for a regional instance is $0.90, and the price for a multi-regional node is $3 per node an hour. Persistence storage is $0.30 per GB a month for regional instance, and $0.50 per GB a month for a multi-regional instance.

Cloud Spanner and AI applications Cloud Spanner can serve all the AI and ML use cases which are suitable for MySQL and PostgreSQL. Cloud SQL is right for serving AI and ML use cases that require up to 10 TB of structured data; for example, a machine learning use case requires data preparation, which involves complex SQL joins and can increase the efficiency of the process.

Cloud Memorystore Cloud Memorystore is a fully managed in-memory data store service built on top of Redis. Here are the features of Cloud Memorystore: Redis is a versatile database and can be used for a lot of use cases. Generally, people use Redis as a caching service, but it has various different data structures that can serve other use cases for the Internet of Things (IoT), ML, streaming, and so on. With Cloud Memorystore, Google is essentially providing scalable and highly available Redis instances. Google's Service Level Agreement (SLA) for Cloud Memorystore is 99.9% for standard tier, and it is fully secured by Google's network policies and role-based access controls. Cloud Memorystore comes in two tiers: basic and standard: The basic instance is good for using as a caching service, and it will not have any SLA attached to it. The standard instance provides high availability, and deploys a replica in another zone, so it can recover from zone failure.

[ 38 ]

Computing and Processing Using GCP Components

Chapter 2

There are five capacity tiers available in Redis M1-M5 based on its storage capacity and network capacity, which vary from 1 GB to 300 GB and 3 Gbps to 12 Gbps respectively. As Cloud Memorystore follows all Redis protocols, it is very easy to lift and shift existing Redis projects on it. Cloud Memory is integrated with identity and access management and Stackdriver, which provide features like 24/7 system monitoring to protect your data, checking the performance of Memorystore, and so on. Its data always resides in VPC. As Cloud Memorystore comes in five different capacities, its pricing changes accordingly. Pricing ranges from $0.016 per GB an hour to $0.049 per GB an hour.

Cloud Memorystore and AI applications Cloud Memorystore can serve various AL and ML use cases using Redis ML modules. Redis ML has various ML modules as built-in data types. Cloud Memorystore can serve machine learning modules for linear and logistic regression, decision tree matrix calculations, and so on.

Cloud Filestore Cloud Filestore is the fully managed high-performance network file storage provided by GCP. Following are the features of Cloud Filestore: Cloud Storage can be used with Google Compute Engine and Kubernetes Engine applications, which require network file storage. Cloud Filestore is fast and reliable, and is perfect for applications that require low latency and high throughput. Filestore is consistent and provides reliable performance over a period of time. It follows the NFSv3 protocol with high availability of 99.99% and the maximum capacity per share is 63.9 TB. Cloud Filestore is highly secure with Google-grade security and comes in two tiers: standard and premium. The price of both tiers varies according to regions. The price of the standard tier varies from $0.20 to $0.24 per GB a month, and the premium tier varies from $0.30 to $0.36 per GB a month.

[ 39 ]

Computing and Processing Using GCP Components

Chapter 2

Cloud Filestore and AI applications Cloud Filestore can serve all the AI and ML use cases, which require high throughput for datasets that are not very complex in nature. In the next section, we'll discuss the processing options.

Understanding the processing options Apart from the IaaS option, which you can use for building your own AI and ML pipeline using compute options, Google provides a few managed services that can be used to process your data and build AI and ML pipelines. The following are the fully managed processing options: BigQuery Dataproc Cloud Dataflow All the managed processing options are integrated with other Google Cloud Services such as networking, identity and access management, Stackdriver, and so on. These make it easy to track activities and tighten security for applications. BigQuery can be used for offloading the existing data warehouse and creating a new one, and with the BigQuery ML options you can build an ML pipeline. Dataproc can be used for migrating an existing Hadoop project on GCP and running AI and ML pipelines over it. Cloud Dataflow can be used for building completely new pipelines on GCP.

BigQuery BigQuery is a cloud data warehouse for GCP and comes in a machine learning flavor (BigQuery ML). It is a very powerful tool that can process petabytes of data, and gives you readily available models that you can use in SQL programming for building your machine learning pipeline.

[ 40 ]

Computing and Processing Using GCP Components

Chapter 2

BigQuery is fast, scalable, and serverless. You can build BigQuery datasets in just a few clicks and start loading the data into them. BigQuery uses Colossus to store data in native tables in a columnar format, and the data is compressed. This makes data retrieval very fast. Apart from storage, BigQuery uses the following tools and network components to make it fast, reliable, and efficient: The Jupyter network for shuffling the data The Dremel engine for processing Borg for cluster management In other words, it uses Google's great infrastructure and best service, making it fast, highly available, and scalable. BigQuery comes with additional features such as data and query sharing, saving required queries; it is ANSI 2011-compliant and integrated with native as well as outside tools including Informatica, Talend, and so on. All the data that is saved in BigQuery is encrypted. It is federated and can query data from other services such as Cloud Storage and Bigtable. BigQuery also supports real-time analytics with BigQuery Streaming. BigQuery has a friendly user interface from where users can perform all the operations, and it has a command-line tool, bqclient, which can be used to connect to BigQuery. BigQuery comes in two pricing models: "pay as you go," which costs users $5 per TB of query processing, and "flat rate pricing," which is approximately $40,000 per month, for which the user gets 2,000 dedicated slots for processing. For storage, it is $0.02 per GB a month; for short-term storage and for long-term storage, it is $0.01 per GB a month.

BigQuery and AI applications BigQuery ML is the BigQuery machine learning flavor that has a few built-in algorithms that can be directly used in SQL queries for training models and predicting output. BigQuery ML currently has support for linear regression, binary logistic regression, and multiclass logistic regression for classification models.

Cloud Dataproc Cloud Dataproc is a fully managed Hadoop and Spark cluster that can be spun within a few seconds. Cloud Dataproc is an auto scaling cluster and can be used to run Hadoop, Spark, and AI and ML applications very effectively. At peak hours, nodes can be added to the cluster based on usage, and it can scale down when there are lower requirements.

[ 41 ]

Computing and Processing Using GCP Components

Chapter 2

Dataproc is integrated with other services such as Cloud Storage, BigQuery, Stackdriver, identity and access management, and networking. This makes the cluster's usage very easy and secure. Beneath a Dataproc cluster Google actually runs compute instances. Users can choose from a wide range of machine configurations to build the cluster or if existing machine configurations are not sufficing the needs, users can build a cluster with a custom machine configuration as well. One very important thing to note here is the use of preemptive instances with the Dataproc cluster. This can work wonders with the pricing of the cluster. Preemptive instances come at much lower prices, approximately at 20% of the actual instance with the same configuration, with the catch that Google can take the instance back with notification of 30 seconds. With a Dataproc cluster, preemptive instances can be used as data nodes because generally a Dataproc cluster will be used for compute purpose and all data will be saved in Cloud Storage. So in this case, even if a preemptive instance goes down, that job will be shifted to another node and there will be no impact. Cloud Dataproc cluster pricing varies with instances, but it has very competitive pricing.

Cloud Dataproc and AI applications Cloud Dataproc can serve various AI and ML use cases using Apache Spark, Hadoop, and other tools. Consider Dataproc as a fully managed cloud Hadoop and Spark cluster. All the AI and ML use cases that can be built on Hadoop and Spark can be built on the Cloud Dataproc cluster.

Cloud Dataflow Cloud Dataflow is a fully managed service for running both batch and streaming applications, and has rich integration for running AI and ML jobs. It is a serverless service provided by Google and is built on top of Apache Beam, and, because of this, both the batch and streaming code can be used with each other. Cloud Dataflow applications can be built in Java and Python in a very simplified way. Cloud Dataflow is integrated with other GCP services such as Cloud Pub/Sub, Cloud Machine Learning, Stackdriver, BigQuery, and Bigtable, which makes it very easy to build Cloud Dataflow jobs. Beneath Cloud Dataflow, App Engine is running, and because of that the user has unlimited capacity to scale their jobs. Cloud Dataflow auto scales on its own based on the requirements of the job.

[ 42 ]

Computing and Processing Using GCP Components

Chapter 2

Cloud Dataflow pricing is mostly the same for batch and stream jobs, apart from the pricing for data processed. It charges on the basis of VCPU, RAM, persistent storage, and amount of data processed.

Cloud Dataflow and AI applications Cloud Dataflow can serve various AL and ML use cases in integration with Cloud Machine Learning. Fraud detection is a classic use case, which can be implemented with Cloud Dataflow and Cloud Machine Learning for streaming jobs. So far, we have learned the essentials of GCP that will help us to use the platform in an efficient way, make the right choices, and build effective pipelines. You now have knowledge of all available compute, storage, and processing options on GCP. Now let's dive into building an ML pipeline.

Building an ML pipeline Let's go through a detailed example where we will build an end-to-end pipeline, right from loading the data into Cloud Storage, creating BigQuery datasets over it, training the model using BigQuery ML, and testing it. In this use case, we will use a logistic regression model for finding the lead conversion probability. You can use any suitable dataset of your choice and follow this example. The leads data contains various attributes about prospective customers. BigQuery ML has built-in functionality where we can directly train the model over any dataset. We can predict the output variable and the conversion probability. BigQuery provides an SQL interface to train and evaluate the machine learning model. This model can be deployed on the platform for consumption. We have two datasets: leads training data and test data, where the training data is 80% of the actual overall data and the test data is 20%. Once the model is trained using the training data, we will evaluate the model on the test data and find the leads conversion probability for each prospect in the following categories: Junk lead Qualified Interested Closed

[ 43 ]

Computing and Processing Using GCP Components

Chapter 2

Not interested Not eligible Unreachable

Understanding the flow design The following chart represents the end-to-end process of loading data into Cloud Storage and BigQuery, and training a model and testing it using the leads data. You can choose a dataset of your choice:

From the preceding diagram, we can see the following: 1. We have loaded the training and test dataset for leads into Cloud Storage buckets. 2. After loading data into Cloud Storage, we will create the leads dataset into BigQuery with two tables, namely, leads_training and leads_test. 3. Once the dataset is created, we will use the leads_training table to train our model and the leads_test table to test the model. We will discuss each step in detail in the following sections.

[ 44 ]

Computing and Processing Using GCP Components

Chapter 2

Loading data into Cloud Storage Let's discuss the step-by-step process to load data into Cloud Storage: 1. You should have the training and test data. 2. Create a training and test bucket in Cloud Storage. 3. From the GCP Console, click on the navigation menu in the top left, and from the storage section click on Storage (Cloud Storage). 4. Click on Create a bucket at the top. You will see the following screen:

5. Give a globally unique name to the bucket. 6. Choose a regional bucket for your use case. 7. Select the Location where you want to create the bucket.

[ 45 ]

Computing and Processing Using GCP Components

Chapter 2

8. Click on Create. 9. Upload the training and test data to their respective buckets by clicking on the bucket, and then either use the upload files option or drop the file into the bucket.

Loading data to BigQuery Now, we will discuss BigQuery datasets and loading data into BigQuery: 1. First, create the Leads datasets in BigQuery by following the following steps: 1. From the GCP console, click on the navigation menu at the top left and from the big data section click on BigQuery. 2. Click on the Project name in the left panel. 3. Click on the create dataset link. 4. Give the dataset name as Leads and create it. Choose the location of your preference, and create the dataset. 2. Next, create the Leads_Training and Leads_Test tables from the available data in the Cloud Storage buckets by following these steps: 1. Click on the Leads dataset in the project in the left panel. 2. Click on Create table. 3. Instead of the empty table, select Create table from: Google Cloud Storage. 4. Give the location of the file. 5. Choose the file format as CSV. 6. Give Table name as Leads_Test_Data or Leads_Training_Data, based on the table being created. 7. Click on the Auto detect schema. 8. In Advanced options, set Header rows to skip as 1 if your dataset has a header. 9. Click on Create table.

[ 46 ]

Computing and Processing Using GCP Components

The preceding steps of table creation are also shown in the following screenshot:

The following screenshot depicts how to skip a header row:

[ 47 ]

Chapter 2

Computing and Processing Using GCP Components

Chapter 2

Now that we've created the Leads datasets in BigQuery and created the Leads_Training and Leads_Test tables from the available data in Cloud Storage buckets, next we will train the model.

Training the model The following BigQuery code snippet will be used to train the leads model using logistic regression from the Leads_Training table: Please load the query from the following link using the leads_model.sql file at https:/​/​github.​com/​vss-​vikram/​HandsOn-​Artificial-​Intelligence-​on-​Google-​Cloud-​Platform. CREATE MODEL `Leads.lead_model_optimum` OPTIONS (model_type = 'logistic_reg') AS SELECT Lead_Stage AS label, lead_origin, lead_source, ..., ..., ..., ..., receive_more_updates_about_our_courses, update_me_on_supply_chain_content, Get_updates_on_PGDMHBSCM, city_new, ..., ..., Asymmetrique_Activity_Score, Asymmetrique_Profile_Score, Last_Notable_Activity FROM Leads.Leads_Training_Data;

From the preceding code, we can see the following: We are creating a model with the lead_model_optimum name in the Leads dataset. A logistic regression model has been created as you can see in OPTIONS, where model_type is logistic_reg.

[ 48 ]

Computing and Processing Using GCP Components

Chapter 2

Lead_Stage is the label on which we will do our predictions on the test data. Lead_Stage is the column from which we can identify the status of the lead.

The columns from lead_origin to Last_Notable_Activity will be used by the model to predict Lead_Status on the test data. This model is created using the data in Leads_Training_Table. Once the model is created, it will be saved in the Leads dataset with the name lead_model_optimum. By clicking on Model, you will be able to see Model Details, Model Stats, and Model Schema. This has complete details such as the algorithm used, the number of iterations, the learn rate, the completion time, and so on. So just by copying and pasting the preceding code in the BigQuery window, you will be able to create your first model. Next, we will see how to evaluate the model that is created.

Evaluating the model In BigQuery, you can use the ml.evaluate() function to evaluate any model. It will give results for that model. In the following code block are the BigQuery code and model evaluation results. Let's have a look at the following code: SELECT * FROM ml.evaluate (model `Leads.lead_model_optimum`, ( SELECT Lead_Stage AS label, * FROM `Leads.Leads_Training_Data` ) )

In the preceding code, we have evaluated lead_model_optimum to find its details. Let's have a look at the following results, after executing the preceding query:

[ 49 ]

Computing and Processing Using GCP Components

Chapter 2

The preceding screenshot shows the complete model evaluation details. As we can see, the SQL interface trains a model with fairly good accuracy as well as training data coverage. The score values also suggest that the model is optimally fit and not underfit or overfit for the evaluation data. The point is that the business analyst can also perform model training and deployment with the simple SQL interface provided by BigQuery.

Testing the model In BigQuery, the ml.predict() function is used to predict outcomes using the model. Execute the following BigQuery code to test your model: SELECT prospect_id, predicted_label FROM ml.predict(model `Leads.lead_model_optimum`, ( SELECT * FROM Leads_Test_Data))

In the preceding code, based on prospect_id, the model is predicting Lead_Stage of test_data. You can see the resulting screenshot. Please compare the model's prediction and the Lead_Stage column of test data based on prospect_id to see the accuracy of the model:

This brings us to the end of this example. Please try the same example on the Dataproc cluster using Spark.

[ 50 ]

Computing and Processing Using GCP Components

Chapter 2

Summary In this chapter, we learned about all the components that help us when building AI applications on GCP. We looked at the different computing, storage, and processing options, and where these options can help us. Remember that choosing the right storage, compute, and processing is very important for building cost-and performance-effective applications. In addition to learning about the components, we did a hands-on example for with leads prediction using BigQuery and Cloud Storage, and you can try the same example with Spark on a Dataproc cluster. In the next chapter, we will dive into leveraging the power of GCP to deal with large volumes of data using the auto scaling features.

[ 51 ]

2 Section 2: Artificial Intelligence with Google Cloud Platform In this section, we will be able to perform hands-on experiments with various algorithms. In Chapter 3, we will cover XGBoost, which is a powerful algorithm, and how it is leveraged to build machine learning applications on Google Cloud Platform (GCP). In Chapter 4, we will cover Cloud AutoML, which provides machine learning as a service on GCP. In Chapter 5, we will build the machine learning pipeline with Cloud MLEngine. In Chapter 6, we will be provided with step-by-step guidelines for building conversational applications with Dialogflow. This section comprises the following chapters: Chapter 3, Machine Learning Applications with XGBoost Chapter 4, Using Cloud AutoML Chapter 5, Building a Big Data Cloud Machine Learning Engine Chapter 6, Smart Conversational Applications Uusing Dialogflow

3 Machine Learning Applications with XGBoost In many areas, machine learning-based data-driven approaches are becoming very important. One good example of this is machine learning-based smart email spam classifiers that protect our email by learning from massive amounts of spam data and user feedback. Another such example could be targeted advertisement systems that present ads to users based on certain contexts or banking fraud detection systems that protect from malicious attackers. In order to make machine learning applications successful in the areas mentioned, there are some important factors that have to be kept in mind. One is building and using an effective statistical model that represents all complex data dependencies covering the most important scenarios, and the other is scalably of building those models to ensure that they work on more and more large datasets. Scalability in machine learning enables quick learning by parallel and distributed computing, and also provides efficient memory utilization. In this chapter, we will be discussing the XGBoost library, which is a type of ensemble learning algorithm. XGBoost is a machine learning algorithm based on a decision-tree ensemble that uses a gradient boosting system. Artificial neural networks tend to outperform any other algorithms or systems when predicting problems involving unstructured data (such as images and text). Nonetheless, decision tree-based algorithms are considered best-in-class right now when it comes to small-to-medium structured/tabular data. This is because, in some cases, the results of only one machine learning model may not be sufficient. Together learning provides a systematic solution for combining multiple machine learning algorithms' predictive power. The result is a single model that provides the added output of multiple models. Ensemble models, which are also referred to as base learners, may either use the same learning algorithm or different algorithms. In this chapter, you will learn about how to implement XGBoost algorithms using Python. You will also learn about gradient boosting concepts and how they support XGBoost algorithms.

Machine Learning Applications with XGBoost

Chapter 3

The chapter will cover the following topics: Overview of the XGBoost library Training and storing XGBoost machine learning models Using XGBoost-trained models Building a recommendation system using XGBoost library

Overview of the XGBoost library XGBoost is the library that helps run ensemble learning machine learning algorithms on a very large dataset in a scalable and performance-centric way. For this reason, it focuses on computational speed and model performance. In another way, the result of an increase in performance or scalability in the case of high-end hardware is more for XGBoost. XGBoost implements algorithms for machine learning within the framework of gradient boosting. We will shortly delve into the XGBoost library. But before that, we should also understand ensemble learning and gradient boosting.

Ensemble learning Ensemble learning algorithms combine multiple basic models to produce an optimum predictive model. As opposed to normal machine learning methods that try to learn a hypothesis from training data, ensemble methods attempt to build and combine a number of hypotheses. With ensemble methods, the focus is on generalization, which at times is ignored by base learners due to the specific nature of algorithms or the bias-variance tradeoff due to the chosen training set. Usually, a base-learning algorithm generates base learners from training data, which could be a decision tree, a neural network, or other machine learning algorithms. Most ensembles use a single basic learning algorithm to produce homogeneous base learners, but there are also several ways to produce heterogeneous learners using multiple different types of learning algorithm. In the latter case, there is no single algorithm for basic learning. Bagging and boosting are widely used ensemble methods. These are a simple assembly technique that uses some model average techniques to build many independent predictors/models/learners together (such as average weighted, majority, or average normal).

[ 54 ]

Machine Learning Applications with XGBoost

Chapter 3

Another bigger question is how to ensure that base learners are not co-related. This is important to ensure that predictions made by those base co-learners are independent of each other. Otherwise, it will preclude an optimized predictive model of ensemble modeling. By different base co-learners, we mean the final base models should be different. The models can be different if they are using datasets that are completely different and represent different contexts altogether. Models can be different if they take into account different hypotheses or use altogether different classes of algorithm.

How does ensemble learning decide on the optimal predictive model? The optimal model is decided when the error produced by the ensemble learning model is as low as possible, and minimality is decided by the lower value of the loss function. The loss function is a measure of how well a model of prediction can predict the expected result. The most common method of finding the minimum function point is gradient descent. In summary, we must first understand what makes errors in the model to really understand what lies behind an ensemble pattern. We will introduce you briefly to these errors and give each ensemble student an insight into these problems. The error of any model can be mathematically divided into three types.

Reducible errors – bias Bias is the distance between the predicted values and the actual values. In other words, bias is the difference between the average model forecast and the correct value that we try to predict. A highly biased model gives very little consideration to training data and simplifies the model too much. High bias causes the algorithm to miss relevant input-output variable relationships. If a model has a high degree of bias, then it means the model is too simple and does not capture the complexity that underpins the data. To quantify bias, it is the average difference between the predicted values and the actual values.

Reducible errors – variance Variance occurs when the model is good on the dataset it was trained on but does not do well on a new dataset such as a test dataset or a validation dataset. Variance tells us how the value from the actual value is dispersed. High variance causes overfitting, which implies random noise in the training data in the algorithm models. If a model shows high variance, it becomes very flexible and adapts to the training set's data points. If a highvariance model meets another data point that it did not learn, it cannot predict correctly. Variance quantifies the difference between the prediction and the same observation.

[ 55 ]

Machine Learning Applications with XGBoost

Chapter 3

Irreducible errors Irreducible errors are errors that are impossible to minimize regardless of the machine learning algorithm you use. They are usually caused by unknown variables that can affect the output variable. The only way to improve the prediction of an irreducible error is to identify and predict those external influences.

Total error The total error is defined as follows:

Normally, you will see a reduction in error due to lower distortions of the model, as your model becomes more complex. This happens only until a certain point, however. As your model continues to complexify, you finally overfit your model and thus the variance will start to rise. A correctly optimized model should balance variance and bias, as shown in the following graph:

[ 56 ]

Machine Learning Applications with XGBoost

Chapter 3

Gradient boosting Gradient boosting is a type of ensemble learner. This means that a final model based on a set of different models will be created. The predictive capacity of these different models is weak as they are overfitting, but since they are combined in an ensemble the results are far better in terms of the overall outcome. In gradient boosting machines, decision trees are the most common type of weak model used. So, in a nutshell, gradient boosting is a regressionand classification-based machine learning methodology that produces a prediction model as an ensemble of weak prediction models, which is typically decision tree-based. Let's now see how can we define it mathematically. Any supervised learning algorithm aims to define and minimize a loss function. The mean square error (MSE) or loss function is defined as follows:

We want our loss function to be minimal. The point of gradient boosting, in fact, is to find the most approximate function of the data that minimizes the loss function. This can be mathematically expressed as follows:

So, in gradient boosting, we want the best function F to be found in addition to finding the best parameters P. This change, as opposed to simple logistic regressions, makes the problem much more complex. Before, the number of parameters we were optimizing for was fixed (for example, a logistic regression model is defined before we start training it); now, it can change as I go through the optimization process if function F changes. Obviously, it would take too long to search for every single feature and parameter for the best ones, so gradient boosting can find the best function F by using many simple functions and combining them. The following are the steps involved in gradient boosting: 1. First, model the data with simple models and error analysis data. These errors point to data points that are hard for a simple model to fit. 2. Then, for subsequent models, we especially focus on hard-to-fit data to correct it. 3. In the final analysis, we combine all the predictors, giving each predictor certain weights.

[ 57 ]

Machine Learning Applications with XGBoost

Chapter 3

The code at https:/​/​github.​com/​PacktPublishing/​Hands-​OnArtificial-​Intelligence-​on-​Google-​Cloud-​Platform represents how gradient boosting can be done in Python. This code is for implementing gradient boosting in Python. But the intent is also to show how the error is reduced after multiple iterations. Scatter plots show how the output (Y) of a machine learning algorithm for input (X) is distributed. The following outputs also show the residual scatter plot, which gets less random after the 20th iteration, indicating that gradient boosting has found the optimal output:

The preceding scatter plot is a representation of the input and output machine learning algorithm. The following shows what the actual output looks like before gradient boosting is applied:

[ 58 ]

Machine Learning Applications with XGBoost

Chapter 3

[ 59 ]

Machine Learning Applications with XGBoost

Chapter 3

We can see the following in the preceding screenshots: Residuals are actually randomly distributed close to the 20th iteration, starting from the 18th iteration. Once the residual random distribution is uniform, we know that gradient boosting has given the optimized output. In other words, gradient boosting has identified a maximum number of patterns from a residual and given the maximum optimized output.

eXtreme Gradient Boosting (XGBoost) XGBoost was developed by Tianqi Chen and its the full name is eXtreme Gradient Boosting. XGBoost is a scalable and accurate implementation of gradient booster machines that has been developed solely for model performance and computational speed as a computer power limit for boosted tree algorithms. Tianqi Chen says that the latter makes it superior and different from other libraries. The following are some important features of XGboost that makes it unique: Parallelization: Sequential trees are built parallelly in XGBoost. The loops that are used for building base learners can be interchanged. So the outer loop that lists the leaf nodes of the trees can be interchanged with the inner loop that calculates the features. If it would have been the original way then without the inner loop that is more computationally intensive, has to be completed before the outer loop started. In XGBoost, to improve run time, the order of the loops is exchanged by initializing a global scan and sorting with parallel threads of all instances. This switch enhances algorithmic efficiency by offsetting any parallel overhead. Stopping criterion: In the gradient boosting framework, the stopping criterion for tree splitting is based on the negative loss function at the time of the split. However, in the case of XGBoost, it starts pruning trees backward at the max_depth parameters as specified. It is a sort of depth-first approach that improves the overall performance and efficiency of the algorithm. Maximum hardware resource usage: XGBoost is designed to maximize hardware resource usage. It utilizes system caches via internal buffers in each tree to store gradient statistics. Additional improvements include computing outof-core and optimizing available disk space while handling large data frames that do not fit into memory.

[ 60 ]

Machine Learning Applications with XGBoost

Chapter 3

Training and storing XGBoost machine learning models In this section, we will look into how we can train and store machine learning models using Google AI Hub. AI Hub serves as a one-stop store for machine learning models to be detected, shared, and deployed. It is a reusable model catalog that can be installed rapidly in an AI platform execution environment. The catalog consists of a compilation of model designs based on common frameworks: TensorFlow, PyTorch, Keras, scikit-learn, and XGBoost. Each model can be packed with a deep learning VM supported by GPU or TPU, Jupyter Notebooks, and Google's very own AI APIs, in the same format that can be implemented in Kubeflow. Each model has labels that facilitate the search and discovery of information based on a range of characteristics. We will use XGBoost with the RAPIDS framework for text classification with the help of the code at .../text_classification_rapids_framework.py. Algorithmically, this code performs the following steps: 1. The necessary packages are imported. At a high level, this code uses OS, google.cloud, cudf(RAPID), sklearn, pandas and xgboost. It also imports pynvml, which is a Python library to lower-level CUDA library for GPU management and monitoring. 2. Next, the code installs the miniconda library and RAPIDS platform, and then it sets some environment variables required for the NVIDIA GPU. 3. The next section of the code sets some constants that are required for accessing Google APIs, such as project id and bucket id. 4. The code then downloads the training data (text_classification_emp.csv) from the GCS bucket. This is then stored in the local job directory for further use. 5. The next part of the code splits the CSV using \n and creates two arrays, one for labels (target variable) and another for text (predictor variable). 6. It then creates a Pandas DataFrame, which then gets converted to a CUDF DataFrame, which is compatible with the underlying GPU. This was done to ensure that all other operations utilize the underlying GPU. 7. Data is then split into training and test datasets following the 80% - 20% rule. 8. Then, the label encoder is used to encode labels to vector values. 9. After that, TF-IDF is calculated at the character level. 10. Finally, models are trained using the XGBoost library.

[ 61 ]

Machine Learning Applications with XGBoost

Chapter 3

To submit the preceding code to train the model, you have to run the following command. This command is the standard google-ai-platform CLI that submits training instructions to train any models on the Google Cloud AI platform: gcloud ai-platform jobs submit training $JOB_NAME \ --job-dir=$JOB_DIR \ --package-path=$TRAINING_PACKAGE_PATH \ --module-name=$MAIN_TRAINER_MODULE \ --region=$REGION \ --runtime-version=$RUNTIME_VERSION \ --python-version=$PYTHON_VERSION \ --config=config.yaml

The environmental variables over there can be set into job.properties as follows, and you have to source the job.properties before running the gcloud ai-platform job. An example can be seen in the following code: PROJECT_ID=test-project-id BUCKET_ID=ml-assets JOB_NAME=gpu_text_classification_$(date +"%Y%m%d_%H%M%S") JOB_DIR=gs://${BUCKET_ID}/xgboost_job_dir TRAINING_PACKAGE_PATH="text_classification" MAIN_TRAINER_MODULE=text_classification.train REGION=us-central1 RUNTIME_VERSION=1.13 PYTHON_VERSION=3.5

The contents of the GPU-specific config.yml file are as follows: trainingInput: scaleTier: CUSTOM # Configure a master worker with 4 K80 GPUs masterType: complex_model_m_gpu # Configure 9 workers, each with 4 K80 GPUs workerCount: 9 workerType: complex_model_m_gpu # Configure 3 parameter servers with no GPUs parameterServerCount: 3 parameterServerType: large_model

[ 62 ]

Machine Learning Applications with XGBoost

Chapter 3

The package structure will look as shown in the following block: text_classification | |__ __init__.py |__ config.yml |__ run.sh |__ job.properties |__ train.py

After submitting the code, you can check the job status with the commands shown in the following screenshot:

[ 63 ]

Machine Learning Applications with XGBoost

Chapter 3

As shown in the following screenshot, this is how driver Cloud ML GPU logs would look:

In this section, we saw how to use the Google Cloud AI platform for XGBoost model training. The steps to deploy the code and use GCP's powerful parallel compute are important. Try to perform each step demonstrated here in your work environment.

Using XGBoost trained models After you have stored the model in Google Cloud Storage, you need to put data in the right format for prediction. It has to be in a vector format and non-sparse. That means you have to have vectors without zero values. In case, if the value is 0 then it has to be represented as 0.0. Let's have a look at the following example:

Perform the following steps to use XGBoost-trained models: 1. Go to console.google.com from your web browser. 2. Open the AI Platform models page in the GCP console:

[ 64 ]

Machine Learning Applications with XGBoost

Chapter 3

3. Next, you will have to create a model resource with the input shown in the following screenshot:

[ 65 ]

Machine Learning Applications with XGBoost

Chapter 3

4. After that, create the version of the model shown in the following screenshot:

5. After creating a model version effectively, the AI Platform begins with a fresh server prepared to serve forecast applications. Now you can run following Python program to call cloud machine learning APIs: import googleapiclient.discovery def predict_json(project, model, instances, version=None): """Send json data to a deployed model for prediction. Args: project (str): project where the AI Platform Model is

[ 66 ]

Machine Learning Applications with XGBoost

Chapter 3

deployed. model (str): model name. instances ([[float]]): List of input instances, where each input instance is a list of floats. version: str, version of the model to target. Returns: Mapping[str: any]: dictionary of prediction results defined by the model. """ # Create the AI Platform service object. # To authenticate set the environment variable # GOOGLE_APPLICATION_CREDENTIALS= service = googleapiclient.discovery.build('ml', 'v1') name = 'projects/{}/models/{}'.format(project, model) if version is not None: name += '/versions/{}'.format(version) response = service.projects().predict( name=name, body={'instances': instances} ).execute() if 'error' in response: raise RuntimeError(response['error']) return response['predictions']

The preceding code is client-side machine learning Python code and utilizes the model deployed using the Google Cloud AI platform. It takes JSON as input and provides the output of prediction. In the next section, we will see how to build a recommender system using the XGBoost library. You can find the details of the Python client library at https:/​/​cloud. google.​com/​ml-​engine/​docs/​tensorflow/​python-​client-​library.

[ 67 ]

Machine Learning Applications with XGBoost

Chapter 3

Building a recommendation system using the XGBoost library Modern internet retail clients request personalized products. This improves their satisfaction, thus also boosting e-retailer income. This is why recommendation systems have been used for many years. Current recommendation systems deliver outcomes that mostly include common products. However, distributors want schemes that also suggest less common items because they still have a big share of the retailer's sales in this lengthy tail. Amazon's recommendation system is renowned. It proposes additional products, often connected to the item presently being viewed by a consumer. Suggested systems can also help customers to find products—or videos when they visit Netflix and YouTube—if they don't fully understand what they're seeking. They are often based on a technology called collaborative filtering. Predictions of user choice are produced by collecting the preferences of many users via this method. It is assumed that, if users have a subset of equivalent preferences, equivalent preferences for other invisible products are more probable to be present. In comparison with users of a corresponding history, collective filtering relies on user history only. In other words, the retailer has a restricted impact on the suggestions made and is highly recommended for common products. This is unwanted for many online companies because much of their income still comes from the long tail of less famous products. Therefore it is essential for companies to find an algorithm that encourages less common products along with famous products. In order to develop a balanced item recommendation system combining the most famous products with less common products that are in a retailer's list of non-selling products, XGBoost, the tree-based gradient boosting algorithm, is recommended. Gradient boosted tree learning can be the correct approach to balanced product suggestions with the open source XGBoost software library. Because of the long-term use of recommendation systems, several recommendation algorithms have been established. The literature reveals that the most promising class of algorithms includes model-based algorithms. Several methods of machine learning are adjusted for user-item relationships, interaction information, and user and item characteristics. Decision tree learning was very successful in previous research. A decision tree is used in this technique to predict the target value of an object using a comment. The gradient boosting method allows models to be added sequentially to correct the mistakes of previous models until further improvements are possible. In combination, the final forecast of the target value is created. More specifically, the likelihood of item purchases by a certain user is suggested.

[ 68 ]

Machine Learning Applications with XGBoost

Chapter 3

To explain this further, let's take an example of a user-item matrix:

In this user-item matrix, rows represent users, each column is an item, and every cell is a user rating. A total of j+1 users and n+1 items are available. Here, Ajn is the user uj score for in. Ajn can be between 1 and 5. Sometimes, if a matrix is whether a user ui looked at an object ij or not, Ajn may also be binary. In this case, Ajn is either 0 or 1. In our situation, we regard Ajn as a 1 to 5 score. This matrix is a very sparse matrix, meaning many cells in it are vacant. Since there are many items, it is not possible for a single user to give a rating to all the items. In the actual scenario, a single user does not give ratings to even the lowest of the items. That is why around 99% of the cells of this matrix are vacant. The empty cells can be represented as Not a Number (NaN). For example, let assume n is 10 million and m is 20,000. So n*m is 20 ^ 10M, which is a very large number. If on average a user gives ratings to 5 items, the total number of ratings given will be 5 * 10 million = 5 * 10⁶ ratings. This is called the sparsity of a matrix. The formula for the sparsity of a matrix is as follows: Sparsity of Matrix = Number of Empty cells / Total Number of cells. Hence, the sparsity of matrix = (1010— 5*106) / 1010 = 0.9995. It means that 99.95% of cells are empty. This is extreme sparsity. With a sparsity of 99.95%, standard recommendation systems are likely to take into account below-the-line items that are profitable but not rated by a particular user. So, the XGBoost algorithm would come in handy.

[ 69 ]

Machine Learning Applications with XGBoost

Chapter 3

Creating and testing the XGBoost recommendation system model Let's create a recommendation system using XGBoost. The code at https:/​/​github.​com/ PacktPublishing/​Hands-​On-​Artificial-​Intelligence-​on-​Google-​CloudPlatform represents how to build a recommendation system model using XGBoost. It is

based on a User-Movie rating dataset. We are going to leverage the RAPIDS package for this implementation. We will install runtime libraries with conda, install the packages, and utilize the graphical library to represent the results. The following table has the sample data for users.csv: 0 0 0 196 186 22 244 166 298 115

50 172 133 242 302 377 51 346 474 265

5 5 1 3 3 1 2 1 4 2

881250949 881250949 881250949 881250949 891717742 878887116 880606923 886397596 884182806 881171488

The following table has the sample data for movie_lens.csv. This is an illustrative sample. The actual training data can be as large as multiple gigabytes, and the XGBoost algorithm can efficiently train it once we use the right set of the underlying infrastructure: item_id 1 2 3 4 5 6 7 8 9 10

title Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995) Copycat (1995) Shanghai Triad (Yao a yao yao dao waipo qiao) (1995) Twelve Monkeys (1995) Babe (1995) Dead Man Walking (1995) Richard III (1995)

These sample files can be used to train and evaluate the model with the XGBoost algorithm. There is a significant performance gain when training with an extremely large dataset.

[ 70 ]

Machine Learning Applications with XGBoost

Chapter 3

Summary Tree boosting is an extremely efficient, widely used technique of machine learning. In this chapter, we defined a scalable end-to-end tree boosting scheme called XGBoost, which data researchers use extensively to obtain state-of-the-art outcomes on many machine learning problems. We suggested a novel algorithm for sparse data and a weighted quantile sketch for approximate tree learning. XGBoost scales over billions of examples using much fewer resources than current systems. We covered different code examples in this chapter, and by now you know how to use the Google Cloud AI Platform to submit models and use those models for prediction. In the next chapter, we will demonstrate using streaming components to perform analytics on data in motion.

[ 71 ]

4 Using Cloud AutoML In the previous chapter, we learned about one of the most popular and handy algorithms in machine learning. In this chapter, we will understand how Google Cloud Platform (GCP) makes it easy to utilize various machine learning models with the AutoML service. AutoML makes it easy to build machine learning models for developers and analysts with limited experience with data engineering and machine learning programming. We will take a practical approach with AutoML and learn how it can be leveraged for building and deploying models for some practical use cases such as document and image classification, speech-to-text conversion, and sentiment analysis. This chapter will cover the following topics: Overview of Cloud AutoML Document classification using AutoML Natural Language Image classification using AutoML Vision APIs Performing speech-to-text conversion using Speech-to-Text APIs. Sentiment analysis using AutoML Natural Language APIs

Overview of Cloud AutoML Supervised machine learning models follow a consistent life cycle. Supervised learning is dependent on historical data. Based on the historical data, the model is trained. Model training is simply building a hypothesis function that is capable of predicting output or dependent variables based on the input or independent variables. For example, in the case of a sales forecasting model for a retail store, historical sales data is used for training. The data can be horizontally spread across a large number of factors that impact sales. The training data is randomly split into a training dataset and evaluation dataset. Typically, there is an 80-20 split between training and evaluation data respectively.

Using Cloud AutoML

Chapter 4

The model is trained based on the selected algorithm and it is then used for evaluating the accuracy based on the evaluation dataset. The training parameters are tuned for improving model accuracy and performance. Once the model is performing well with various evaluation samples, it is ready for deployment and use in the real environment. At this stage, the model encounters entirely new datasets, which it might not have seen during training. This is where a generalization of the hypothesis function is leveraged for predictions. The following diagram depicts the generic supervised learning process:

Traditionally, in a non-cloud environment, this process needs to be fully managed by the data scientist. GCP Cloud AutoML is a fully managed environment that takes care of all of the operational processes, infrastructure, and model management. GCP Cloud AutoML provides interfaces for the models related to Natural Language Processing (NLP) and Computer Vision (CV). In this chapter, we will deep-dive into GCP Cloud AutoML with sample use cases that leverage NLP and CV interfaces. The advantages of Cloud AutoML are as follows: Ease of use: Developers with limited machine learning experience can easily use the AutoML platform for training custom machine learning models. The models can be quickly customized based on specific business use cases. The underlying complexity is managed by GCP and the end user does not need to worry about computation capacity or storage requirements while training the models and running various experiments.

[ 73 ]

Using Cloud AutoML

Chapter 4

High performance: By using the GCP AutoML engine for training, evaluating, and deploying models, the user gets access to Google's state-of-the-art compute infrastructure and distributed computing platform, which consistently provides a reliable outcome. The cluster is dynamically autoscaled based on the resource requirements in proportion to the data volumes and the model complexity. Speed and agility: The supervised learning process, which is shown in the preceding diagram, is fully managed with a simple and intuitive user interface provided by AutoML. The GUI makes it easy to train, evaluate, tune, and deploy the models quickly. Due to the speed and agility at which various experiments can be conducted, different models can be tried and tested quickly. The parameters can be tuned and results can be quickly validated. This makes the process of model development and deployment extremely agile.

The workings of AutoML AutoML simplifies the supervised learning process by creating a high-level abstraction over the training, evaluation, and deployment of machine learning models. The following diagram shows the workings of AutoML:

Comparing AutoML to working with a traditional supervised learning pipeline, the following is evident: AutoML simplifies the process of training, evaluating, and deploying ML models. Additionally, AutoML supports interacting with the model via a RESTful API.

[ 74 ]

Using Cloud AutoML

Chapter 4

Integration with a REST API makes it easy to leverage the machine learning models from a variety of applications and endpoints. The AutoML user interface is typically used for the experiments and quickly testing the hypothesis. However, the REST API is used for training, evaluating, and utilizing machine learning models. Here is a quick overview of generic and representative API endpoints supported by AutoML.

AutoML API overview RESTful APIs are used by AI applications built on GCP for invoking AutoML services. The user interface is typically used for proof of concept but the enterprise application requires a rich set of API libraries for interacting with components such as AutoML. In this section, we will take a look at a representative set of APIs and significant fields within these RESTful APIs. Let's look at the API endpoint for interacting with models within v1beta1.projects.locations.models.

REST source – pointing to model locations In the following table, we have listed some basic method calls via the model locations API. The calling application needs to pass appropriate parameters to the API endpoints to create and delete the model and use it for predictions: Method Endpoint POST create /v1beta1/{parent}/models

Description Creates a model

delete DELETE /v1beta1/{name}

Deletes a model

get

GET /v1beta1/{name}

Gets a model

predict

POST /v1beta1/{name}:predict

Performs a prediction

[ 75 ]

Response Returns a model in the response field when it completes Returns google.protobuf.Empty in the response field when it completes and deleteDetails in the metadata field If successful, the response body contains an instance of the model

Using Cloud AutoML

Chapter 4

Let's look at the actual payload that can be used for calling the API. Here is the JSON representation of the machine learning model. The key fields and the corresponding data types are depicted in the following snippet: { "name": string, "displayName": string, "datasetId": string, "createTime": string, "updateTime": string, "deploymentState": enum(DeploymentState), // Union field model_metadata can be only one of the following: "imageClassificationModelMetadata": { object(ImageClassificationModelMetadata) }, "textClassificationModelMetadata": { object(TextClassificationModelMetadata) }, "translationModelMetadata": { object(TranslationModelMetadata) } // End of list of possible types for union field model_metadata. }

The application runtime needs to provide generic stubs to utilize the response content and utilize those in the calling application. Let's have a look at the descriptions of the fields in the preceding code: name: This is an output-only field that represents the resource name of the model.

The output format of this field is projects/{project_id}/locations/{locationId}/models/{modelId}. displayName: This field is the name of the model that is displayed in the GCP web interface. The name can be selected by the user. The rules that govern the name of this field are as follows: The length can be up to 32 characters. It can contain ASCII Latin letters A-Z and a-z: underscore (_) ASCII digits 0-9 The first character must be a letter.

[ 76 ]

Using Cloud AutoML

Chapter 4

datasetId: This field points to the resource ID of the dataset used to create the

model. The dataset must belong to the same ancestor project and location. createTime/updateTime: This is the date-timestamp when the model is created/updated. GCP uses RFC3339 UTC format with nanosecond precision, for example, 2019-03-05T15:01:23.045123456Z. deploymentState: This is the current state of the model. The possible states of a model on the GCP platform are as follows: DEPLOYMENT_STATE_UNSPECIFIED: It should not be used. DEPLOYED: The model is deployed. UNDEPLOYED: The model is not deployed. imageClassificationModelMetadata, textClassificationModelMetadata, and translationModelMetadata are used for image, text, and translation models

respectively. We will explain these in the subsequent sections in this chapter. Here is a REST source API for model evaluations: v1beta1.projects.locations.models.modelEvaluations.

REST source – for evaluating the model Here are the method calls for getting and listing the model evaluations: Method

Endpoint

Description

get list

GET /v1beta1/{name}

Gets model evaluation

GET /v1beta1/{parent}/modelEvaluations

Lists model evaluations

The model evaluation parameters can be listed with the simple API and can be used for the iterative model improvement within the calling applications. The following API enables the applications to analyze the model operation during runtime. There are cases when the model training and evaluation take a longer time due to the data volume and model complexity. These operational calls help the application to report the status of model training and evaluation to the end users.

[ 77 ]

Using Cloud AutoML

Chapter 4

REST source – the operations API The v1beta1.projects.locations.operations API is listed in the table as follows: Method Endpoint

Description

cancel POST /v1beta1/{name}:cancel delete DELETE /v1beta1/{name} get GET /v1beta1/{name} list GET /v1beta1/{name}/operations

Starts asynchronous cancellation on a long-running operation

wait

POST /v1beta1/{name}:wait

Deletes a long-running operation Gets the latest state of a long-running operation Lists operations that match the specified filter in the request Waits for the specified long-running operation until it is done or reaches at most a specified timeout, returning the latest state

Note that the requests for the operations API are asynchronous and non-blocking for the calling applications. These are useful for calling applications in reporting the progress of model training and evaluations.

Document classification using AutoML Natural Language Document classification is a very important use case that primarily deals with analyzing the contents of a document or a large archive of documents (for example, legal documents). At times, manual classification requires a lot of effort and may not be feasible. GCP provides an easy-to-use natural language interface, which can be custom trained for document classification based on AutoML. Let's understand the classification process with a publicly available dataset of 20 newsgroups. The dataset is available at http:/​/​qwone.​com/​~jason/ 20Newsgroups/​ for download. This is a collection of approximately 20,000 newsgroup documents, partitioned evenly across 20 different newsgroups. These newsgroups correspond to a different topic. The goal is to train a model based on the training data, evaluate the model, and eventually use it for document classification.

[ 78 ]

Using Cloud AutoML

Chapter 4

The traditional machine learning approach for document classification The document classification process goes through a standard set of steps. Before the classification model can be trained, the document contents need to be curated for accuracy in classification. The following diagram shows the overall process of document classification using machine learning in a traditional manner:

Unlike the approach depicted in this diagram, AutoML simplifies the pipeline for document classification.

Document classification with AutoML In this section, we will look at the AutoML interface within GCP for using AutoML for document classification.

[ 79 ]

Using Cloud AutoML

Chapter 4

Navigating to the AutoML Natural Language interface Log in to https:/​/​console.​cloud.​google.​com with your GCP credentials. On the navigation menu, go to the ARTIFICIAL INTELLIGENCE section and click on the Natural Language submenu:

AutoML Natural Language makes it easy to upload datasets and train models. In our example, the dataset is a collection of newsgroup documents.

[ 80 ]

Using Cloud AutoML

Chapter 4

Creating the dataset To create a new dataset, click on the New Dataset button in the title bar:

Provide a unique name for the dataset (in this case, newsgroups). GCP provides the following options for uploading datasets: Upload a CSV file from a computer: The CSV file should contain the list of Google Cloud Storage paths and corresponding labels separated by a comma.

[ 81 ]

Using Cloud AutoML

Chapter 4

Upload text items from a computer: The interface allows the selection of multiple text files or a ZIP archive containing multiple files. Select a CSV on cloud storage: The labeled CSV file containing the paths and labels can be selected from Cloud Storage. Once the dataset is created, additional items can be imported from the Google Cloud Storage URIs and local drives.

Labeling the training data The training data can be labeled by including labels in the CSV file that is uploaded to the dataset and using the AutoML Natural Language UI for labeling the text documents:

Consider the following tips while creating a labeled training dataset: Create the dataset with variable length documents and various writing styles and authors. A higher level of variation improves model accuracy. The documents need to be classifiable for human readers. The AutoML interface depends on the pre-labeled training data and hence the accuracy of the model depends on the accuracy of the manual labeling process.

[ 82 ]

Using Cloud AutoML

Chapter 4

GCP recommends having 1,000 training documents per label. The minimum number of documents per label is 10. The higher the number of training samples per label, along with variation in the content, the higher the model accuracy is. Use None_of_the_above or an equivalent label for the documents that cannot be categorized within one of the pre-defined labels. This will increase the accuracy of the model instead of incorrect labeling or keeping a blank label.

Training the model When the dataset is created and fully labeled, the model can be trained. Click on the TRAIN NEW MODEL button to start the asynchronous training process by providing a name for the model. The model name can be up to 32 characters and can contain letters, numbers, and underscores. The time required for model training depends on the volume and variation of the training data. It may take between a few minutes to hours to train the model. Once the model training is completed, an email notification is sent to the registered email ID.

As you can see in the following screenshot, once the model is trained, model attributes such as the date-time when the model was created and the quantitative nature of the training data are displayed along with quality parameters such as Precision and Recall:

[ 83 ]

Using Cloud AutoML

Chapter 4

Evaluating the model Unlike the traditional machine learning pipeline, where we need to evaluate the model on an evaluation dataset, GCP uses items from the test set internally to evaluate the model. The model quality and accuracy are checked in this stage. AutoML provides two levels of aggregated quality metrics indicating how well the model is trained overall across all of the features and labels and on each category label. The following metrics are reported by GCP: AuPRC (Area under Precision-Recall Curve): This represents the average precision. The typical value is between 0.5 and 1.0. Higher values indicate more accurate models. Precision and recall curves: AutoML provides an interactive way of setting the threshold value for the labels. A lower threshold increases recall but lowers precision:

Confusion Matrix: This allows visualization of the model accuracy in predicting the output classes. It represents a percentage of output labels in training data during the evaluation phase:

[ 84 ]

Using Cloud AutoML

Chapter 4

These readily available metrics with AutoML are useful in evaluating the model's reliability in predicting the output class on actual datasets. If the confusion level is high and the precision and recall scores are low, the model may require additional training data. Along with the web interface to model evaluation, GCP provides a programmable API interface for evaluation using the command line, Python, Java, and Node.js.

The command line Use the following command to get the model evaluation parameters in JSON format by replacing model-name with the actual model name: curl -H "Authorization: Bearer $(gcloud auth application-default printaccess-token)" -H "Content-Type:application/json" https://automl.googleapis.com/v1beta1/model-name/modelEvaluations

This command returns the evaluation parameters in JSON format, as follows: { "modelEvaluation": [ { "name": "projects/434039606874/locations/uscentral1/models/7537307368641647584/modelEvaluations/9009741181387603448", "annotationSpecId": "17040929661974749", "classificationMetrics": { "auPrc": 0.99772006, "baseAuPrc": 0.21706384, "evaluatedExamplesCount": 377, "confidenceMetricsEntry": [ { "recall": 1, "precision": -1.3877788e-17, "f1Score": -2.7755576e-17, "recallAt1": 0.9761273, "precisionAt1": 0.9761273, "f1ScoreAt1": 0.9761273 }, { "confidenceThreshold": 0.05, "recall": 0.997, "precision": 0.867, "f1Score": 0.92746675, "recallAt1": 0.9761273, "precisionAt1": 0.9761273, "f1ScoreAt1": 0.9761273

[ 85 ]

Using Cloud AutoML

Chapter 4

There are two primary encapsulated objects for the representation of the model evaluation parameters: the header-level classification metrics and confidence metrics. The precision and recall scores are also reported in this JSON response.

Python Here is the Python code snippet for getting the model evaluation parameters: project_id = 'PROJECT_ID' compute_region = 'COMPUTE_REGION' model_id = 'MODEL_ID' filter_ = 'Filter expression' from google.cloud import automl_v1beta1 as automl client = automl.AutoMlClient() # Get the fully qualified path of the model based on project, region and model model_full_id = client.model_path(project_id, compute_region, model_id) # Apply the filter for listing all the model evaluations. response = client.list_model_evaluations(model_full_id, filter_) print("Loop for listing all the model evaluations received based on the filter condition") for element in response: print(element)

This code snippet gets the model evaluation parameters and iterates over the response and prints individual parameters such as precision and recall.

Java Here is the equivalent Java code for getting the model evaluation parameters: public static void autoMLModelEvaluation( String projectId, String computeRegion, String modelId, String filter) throws IOException { // Instantiates a client AutoMlClient client = AutoMlClient.create(); // Get the full path of the model. ModelName modelFullId = ModelName.of(projectId, computeRegion, modelId); // List all the model evaluations in the model by applying filter

[ 86 ]

Using Cloud AutoML

Chapter 4

ListModelEvaluationsRequest modelEvaluationsrequest = ListModelEvaluationsRequest.newBuilder() .setParent(modelFullId.toString()) .setFilter(filter) .build(); // Iterate through the results. String modelEvaluationId = ""; for (ModelEvaluation element : client.listModelEvaluations(modelEvaluationsrequest).iterateAll()) { if (element.getAnnotationSpecId() != null) { modelEvaluationId = element.getName().split("/")[element.getName().split("/").length - 1]; } } // Resource name for the model evaluation. ModelEvaluationName modelEvaluationFullId = ModelEvaluationName.of(projectId, computeRegion, modelId, modelEvaluationId); // Get a model evaluation. ModelEvaluation modelEvaluation = client.getModelEvaluation(modelEvaluationFullId); ClassificationEvaluationMetrics classMetrics = modelEvaluation.getClassificationEvaluationMetrics(); List confidenceMetricsEntries = classMetrics.getConfidenceMetricsEntryList(); // Iterate over the list and get individual evaluation parameters.

This code snippet gets the model evaluation parameters and iterates over the response and prints individual parameters such as precision and recall. It can also be packaged as an independent API call and developed as a microservice.

[ 87 ]

Using Cloud AutoML

Chapter 4

Node.js Here is the Node.js implementation of the evaluation code. The code is concise and asynchronously calls the evaluation API: const automl = require(`@google-cloud/automl`); const util = require(`util`); const client = new automl.v1beta1.AutoMlClient(); const const const const

projectId = `PROJECT_ID`; computeRegion = `REGION_NAME`; modelId = `MODEL_ID`; modelEvaluationId = `MODEL_EVAL_ID”;

// Get the full path of the model evaluation. const modelEvaluationFullId = client.modelEvaluationPath( projectId, computeRegion, modelId, modelEvaluationId ); // Get complete detail of the model evaluation. const [response] = await client.getModelEvaluation({ name: modelEvaluationFullId, }); console.log(util.inspect(response, false, null));

A non-blocking asynchronous microservice can be developed with this code snippet, which can respond with the evaluation parameters to the calling application.

Using the model for predictions Once the model is built and evaluated, GCP provides a simple interface for using the model for predictions. In this case, the model is ready for the classification of documents in various categories based on keywords in the input data.

[ 88 ]

Using Cloud AutoML

Chapter 4

The web interface GCP provides a web interface for quickly testing the model and a REST API that can be used in a production environment for document classification:

In this case, the model has predicted that the document is related to ComputerGraphics with 100% confidence based on the text entered in the text area.

A REST API for model predictions We can use a simple REST API to use the deployed model for predictions: export GOOGLE_APPLICATION_CREDENTIALS=key-file-path curl -X POST -H "Authorization: Bearer $(gcloud auth application-default print-accesstoken)" -H "Content-Type: application/json" https://automl.googleapis.com/v1beta1/projects/ai-gcp-ch4/locations/us-cent ral1/models/TCN2853684430635363573:predict -d '{ "payload" : { "textSnippet": { "content": "YOUR TEXT HERE", "mime_type": "text/plain" },

[ 89 ]

Using Cloud AutoML

Chapter 4

} }'

The API can be called with Python in the production environment.

Python code for model predictions Save the following code in a file named document_classifier.py. This code uses the deployed model for predictions: import sys from google.cloud import automl_v1beta1 from google.cloud.automl_v1beta1.proto import service_pb2 def get_prediction(content, project_id, model_id): prediction_client = automl_v1beta1.PredictionServiceClient() name = 'projects/{}/locations/us-central1/models/{}'.format(project_id, model_id) payload = {'text_snippet': {'content': content, 'mime_type': 'text/plain' }} params = {} request = prediction_client.predict(name, payload, params) return request # waits till request is returned if __name__ == '__main__': content = sys.argv[1] project_id = sys.argv[2] model_id = sys.argv[3] print get_prediction(content, project_id, model_id)

Call the document classifier API with python document_classifier.py "Document Text" . The Python code can be executed with this command by passing the project ID and the deployed model ID.

Image classification using AutoML Vision APIs GCP provides Vision APIs, which can be used for building intelligent applications for unstructured data in the form of visual inputs (images and videos) and can be accessed via the web console and through APIs.

[ 90 ]

Using Cloud AutoML

Chapter 4

Image classification steps with AutoML Vision The process of image classification on GCP follows steps similar to the document classification process we saw in the Document classification with AutoML section. Here are the steps involved in image classification:

It is assumed that you already know how to set up a Google account and have created a GCP project. Let's walk through the steps involved in image classification from the Collect Training Images step.

Collecting training images The AutoML Vision API uses a supervised learning model and hence requires a collection of training images. These are the pre-labeled images that are used for training purposes. The images can contain one or more objects within them and the individual output labels need to be defined and validated at a sample basis for accuracy. We need to build a dataset for training the image classification models.

Creating a dataset As a first step, we need to create a placeholder for the images that we need to use for training and evaluation of the models. When the dataset is created, we need to mention classification type as either multi-class or multi-label. In the case of a multi-class classifier, a single label is assigned to each classified document whereas a multi-label classifier can assign multiple labels to a document.

[ 91 ]

Using Cloud AutoML

Chapter 4

The web interface provides an intuitive way of creating the dataset: 1. Click on the New Dataset button in the title bar. The following screenshot shows the user interface for creating the dataset:

[ 92 ]

Using Cloud AutoML

Chapter 4

2. Provide a unique name within your project for the new dataset. There are two ways to upload the images within the dataset creation UI: Upload images from your computer: AutoML supports the JPG and PNG formats. A ZIP archive containing a set of JPG or PNG images can also be uploaded in bulk. Select a CSV file on Cloud Storage: A comma-separated file containing a list of paths to the images on Google Cloud Storage and their labels if available at the time of the creation of the dataset. The images import can also be deferred to later and a dataset can be created as a placeholder without any images in it. The classification type can be specified by checking the Enable multi-label classification checkbox. The default classification type is multiclass.

Labeling and uploading training images We will utilize Cloud Storage to upload the images and create a CSV file that labels the image file contents. We will create a Google Cloud Storage bucket and store the documents that will be used to train the custom model. Use the following command from Google Cloud Shell to create a Cloud Storage bucket: gsutil mb -p ai-gcp-ch4 -c regional -l us-central gs://ai-gcp-ch4-vcm/

The bucket name must be in the format PROJECT_ID-vcm.

This command will create a storage bucket within the project with the name ai-gcp-ch4vcm. We will use a publicly available dataset containing flower images in the newly created storage bucket with the following command: gsutil -m cp -R gs://cloud-ml-data/img/flower_photos/ vcm/img/

[ 93 ]

gs://ai-gcp-ch4-

Using Cloud AutoML

Chapter 4

Once the images are loaded into the storage bucket in a batch, the easiest way to label the images is by using a CSV. The generic format for the CSV file is Image_Path, Label(s). In the case of multiple labels (multiple objects within one image), the individual labels need to be separated by a comma (,), as shown in the following example: gs://ai-gcp-ch4vcm/img/flower_photos/roses/15674450867_0ced942941_n.jpg,roses gs://ai-gcp-ch4-vcm/img/flower_photos/daisy/1031799732_e7f4008c03.jpg,daisy

In this example, the first image contains roses and the second image contains daisy flowers. Let's import the images and labels into the dataset:

The images listed in the CSV file and present in the storage bucket will be loaded into the dataset and labeled in this step. At this point, our dataset is ready to train the classification model. Here is the snapshot of the labeled dataset in the AutoML Vision interface:

[ 94 ]

Using Cloud AutoML

Chapter 4

Our dataset contains 3,665 labeled images distributed among five labels. The web interface allows modifications to the labels and deleting images as necessary. Populating the dataset with labeled images can also be done by using the REST API and command-line interface and it can be invoked pragmatically. Here is the HTTP POST URL that can be used for invoking the API. (replace your Project_ID, Region, and datasetId in this URL string): POST https://automl.googleapis.com/v1beta1/projects/ai-gcpch4/locations/us-central1/datasets/ICN7902227254662260284:importData

The request body contains the payload in the following JSON format: { "inputConfig": { "gcsSource": { "inputUris": "gs://ai-gcp-ch4vcm/img/flower_photos/all_data_updated.csv", } } }

Save the request body in an import_images_request.json file and send a curl request on Cloud Shell: curl -X POST \ -H "Authorization: Bearer "$(gcloud auth application-default print-accesstoken) \ -H "Content-Type: application/json; charset=utf-8" \ -d @import_images_request.json \ https://automl.googleapis.com/v1beta1/projects/ai-gcp-ch4/locations/uscentral1/datasets/ICN7902227254662260284:importData

The API can also be called from Python by importing the automl_v1beta1 package. Here is the Python snippet for importing the images into the dataset: project_id = 'ai-gcp-ch4' compute_region = 'us-central1' dataset_id = 'ICN7902227254662260284' paths = 'gs://ai-gcp-ch4-vcm/img/flower_photos/all_data_updated.csv' from google.cloud import automl_v1beta1 as automl client = automl.AutoMlClient() dataset_full_id = client.dataset_path(project_id, compute_region, dataset_id) input_config = {'gcs_source': {'input_uris': paths}}

[ 95 ]

Using Cloud AutoML

Chapter 4

response = client.import_data(dataset_full_id, input_config) print('Processing import...') print('Data imported. {}'.format(response.result()))

Once the images are loaded into the dataset, the next step is to train the classification model.

Training the model GCP AutoML abstracts the complexity of actual model training and creates three unique sets from the images within the dataset. 80% of the images are randomly tagged as a training set and 10% each for the evaluation and test sets. A simple interface initiates the classification model training process:

[ 96 ]

Using Cloud AutoML

Chapter 4

Depending on the number of training images, the number of labels, and the allocated computation resources, the model training may take up to 15 minutes or several hours. The process is asynchronous and an email is triggered once the model training is completed. The model training can also be done via the command line and by calling the API pragmatically. Here is the POST request for training the model (use appropriate parameters for Project_ID, Region, datasetId, and displayName; displayName is the name of the model, which can be selected by the user): curl \ -X POST \ -H "Authorization: Bearer $(gcloud auth application-default print-accesstoken)" \ -H "Content-Type: application/json" \ https://automl.googleapis.com/v1beta1/projects/ai-gcp-ch4/locations/uscentral1/models \ -d '{ "datasetId": "ICN7902227254662260284", "displayName": "aigcpch4-image-clasification-model", "image_object_detection_model_metadata": { "model_type": "cloud-low-latency-1" }, }'

There are two options available for model_type. cloud-low-latency-1 minimizes the training time at the cost of model accuracy whereas cloud-high-accuracy-1 minimizes the training error by going through a higher number of training iterations and creates a more accurate model and hence takes more time and computation resources for training. The choice needs to be made based on a specific use case along with a time and cost budget. The model can be trained pragmatically by leveraging the AutoML Vision API. Here is the Python code for training the model: project_id = 'ai-gcp-ch4' compute_region = 'us-central1' dataset_id = 'ICN7902227254662260284' model_name = 'aigcpch4-image-clasification-model' from google.cloud import automl_v1beta1 as automl client = automl.AutoMlClient() project_location = client.location_path(project_id, compute_region) my_model = { 'display_name': model_name, 'dataset_id': dataset_id,

[ 97 ]

Using Cloud AutoML

Chapter 4

'image_object_detection_model_metadata': { "model_type": "cloud-low-latency-1" } } response = client.create_model(project_location, my_model) print('Training image object detection model...')

Once the model training is complete, the model is listed in the Models section in the AutoML Vision web interface:

The next step is to evaluate the model for accuracy. The evaluation statistics are available on the web interface under the EVALUATE tab.

Evaluating the model Once the model is trained with the training set, the evaluation set is used for model evaluation. The evaluation results are available in the EVALUATE tab and present Avg Precision, Precision, and Recall. Here is a screenshot of the web interface for the evaluation of the model:

[ 98 ]

Using Cloud AutoML

Chapter 4

As seen in the screenshot, we get the model evaluation metrics on the user interface. We get the following important model training parameters: Avg Precision: This measures the quantum of model performance across all of the score thresholds. Precision: This is a measure of the correct proportion of positive indications. Mathematically, precision is defined as . True positive represents an outcome where the model correctly predicts the positive class. False positive represents an outcome where the model incorrectly predicts the positive class. Recall: This is a measure of the proportion of actual positives that are identified correctly. Mathematically, recall is defined as . False negative represents an outcome where the model incorrectly predicts a negative class. A model can be fully evaluated by using both precision and recall measures, and hence average precision is significant in understanding the model's effectiveness. AutoML provides a consolidated view of the model parameters across all of the labels, along with the parameter values for a specific label:

[ 99 ]

Using Cloud AutoML

Chapter 4

The model can be evaluated by using REST APIs, which can be invoked via the command line as well as pragmatically.

The command-line interface Here is the command for getting the model evaluations from the model deployed on GCP AutoML: curl -X GET \ -H "Authorization: Bearer "$(gcloud auth application-default print-accesstoken) \ https://automl.googleapis.com/v1beta1/projects/ai-gcp-ch4/locations/uscentral1/models/ICN7883804797052012134/modelEvaluations

We need to provide the project name, region, and model ID for getting the evaluations using the command-line interface.

Python code Here is a snippet of Python code that can be used for getting the model evaluations for the deployed model. We need to pass project_id, region, and model_id as parameters: project_id = 'ai-gcp-ch4' compute_region = 'us-central1' model_id = 'ICN7883804797052012134' filter_ = '' from google.cloud import automl_v1beta1 as automl client = automl.AutoMlClient() model_full_id = client.model_path(project_id, compute_region, model_id) response = client.list_model_evaluations(model_full_id, filter_) print('List of model evaluations:') for element in response: print(element)

Once the model is built and evaluated for threshold accuracy, it is time to test the model with a set of new images.

[ 100 ]

Using Cloud AutoML

Chapter 4

Testing the model GCP AutoML provides a simple interface for testing the model. A new image can be uploaded from the web UI and tested against the deployed model. Here is a screenshot of the web UI demonstrating the required steps:

The model can also be tested and utilized via the REST API with the command line as well as programmatically. Create an image_classification_request.json file as follows: { "payload": { "image": { "imageBytes": "IMAGE_BYTES" } } }

[ 101 ]

Using Cloud AutoML

Chapter 4

Send the following request to the web service: curl -X POST -H "Content-Type: application/json" \ -H "Authorization: Bearer $(gcloud auth application-default print-accesstoken)" \ https://automl.googleapis.com/v1beta1/projects/ai-gcp-ch4/locations/us-cent ral1/models/ICN7883804797052012134:predict -d @image_classification_request.json

Python code Here is the Python code that can be used within applications to invoke the model based on a new set of images: import sys from google.cloud import automl_v1beta1 from google.cloud.automl_v1beta1.proto import service_pb2 def get_prediction(content, project_id, model_id): prediction_client = automl_v1beta1.PredictionServiceClient() name = 'projects/{}/locations/us-central1/models/{}'.format(project_id, model_id) payload = {'image': {'image_bytes': content }} params = {} request = prediction_client.predict(name, payload, params) return request # waits till request is returned if __name__ == '__main__': file_path = sys.argv[1] project_id = sys.argv[2] model_id = sys.argv[3] with open(file_path, 'rb') as ff: content = ff.read() print get_prediction(content, project_id, model_id)

As illustrated in this section, AutoML makes it seamless and easy to train a model for image classification and the model is deployed on the cloud and can be accessed by authenticated users and service accounts with a simple API interface.

[ 102 ]

Using Cloud AutoML

Chapter 4

Performing speech-to-text conversion using the Speech-to-Text API GCP provides a very efficient and easy-to-use API for speech-to-text conversion. Even though the interface is simple and intuitive, there are deep neural networks underneath that continuously train and enrich the speech-to-text models. The application developer does not need to understand the underlying details and specific neural network configurations and tuning. The API recognizes more than a hundred languages and dialects at the time of writing this. The platform provides speech-to-text conversion as a service in batch as well as real-time modes. The accuracy of the models improves over a period of time as the platform is used by more and more users for conversions. The platform also provides APIs for automatically detecting spoken language. This feature is handy in the specific use cases that allow voice commands. The API allows the selection of pre-built models that cater to specific use cases. For example, the Command and Search model is best suited for voice commands, and the Phone Call model is best suited for transcribing phone conversations. The following diagram depicts all of the supported features for the Speech-to-Text API:

In this section, we will walk through the API and understand how to build applications by leveraging the Speech-to-Text API.

[ 103 ]

Using Cloud AutoML

Chapter 4

There are three ways to interact with the Speech-to-Text API: Synchronous Recognition: This is a blocking API call that is suitable for audio content that is less than 1 minute. The audio data is sent to the REST or gRPC endpoint. The API responds only when the entire audio content is transcribed. Asynchronous Recognition: This is a non-blocking API call that is suitable for longer duration audio content (up to 480 minutes). This API call initiates a longrunning operation on the cloud and the calling service needs to poll periodically for the transcription results. The calling service needs to manage text data during subsequent calls to optimize the performance. Streaming Recognition: This is used for live transcription and supported via gRPC as a bi-directional stream. The text conversion process is in real time and the response text is available as a continuous stream. The calling service needs to gather the time-series data and utilize it as a stream.

Synchronous requests The request body consists of configuration parameters and the path to an audio file. A sample synchronous request is as follows: { "config": { "encoding": "LINEAR16", "sampleRateHertz": 16000, "languageCode": "en-US", }, "audio": { "uri": "gs://bucket-name/path_to_audio_file" } }

The config field in the JSON request body is a manifestation of the RecognitionConfig object whose JSON representation is as follows: { "encoding": enum (AudioEncoding), "sampleRateHertz": number, "audioChannelCount": number, "enableSeparateRecognitionPerChannel": boolean, "languageCode": string, "maxAlternatives": number, "profanityFilter": boolean, "speechContexts": [ {

[ 104 ]

Using Cloud AutoML

Chapter 4

object (SpeechContext) } ], "enableWordTimeOffsets": boolean, "enableAutomaticPunctuation": boolean, "metadata": { object (RecognitionMetadata) }, "model": string, "useEnhanced": boolean }

The fields in the JSON template are defined in the following table: Field Name

Type

Description This field defines the encoding for the audio file that needs to be transcribed. The following values are supported by the API: •

ENCODING_UNSPECIFIED •

LINEAR16 – 16-bit uncompressed format encoding



enum

FLAC (Free Lossless Audio Codec): This encoding is more reliable than LINEAR16 and requires half the bandwidth •

sampleRateHertz

number

audioChannelCount

number

enableSeparateRecognitionPerChannel boolean

languageCode

string

[ 105 ]

MULAW/AMR (Adaptive Multi-Rate Narrowband codec)/AMR_WB (WideBand) /OGG_OPUS/SPEEX_WITH_HEADER_BYTE This field defines the sampling rate of the audio data. The data points range from 8,000 to 48,000 Hz. In the case of WAV and FLAC audio formats, this field is optional. This field indicates several channels in input audio data. If audioChannelCount is greater than one, the parameter needs to be explicitly set to True. This is a mandatory field that indicates the language of conversation. For example, enUS is used for referring to US English.

Using Cloud AutoML

Chapter 4

maxAlternatives

number

profanityFilter

boolean

speechContexts[]

enableWordTimeOffsets

enableAutomaticPunctuation

metadata

This is an optional parameter that indicates the maximum number of alternative recognition messages returned in the response. Depending on the trained model and the speech context, the server may return less than the set number of alternatives. The parameter values range between 0-30. A value of 0 or 1 will return at most one recognition and if the field is not part of the request, no more than one recognition will be returned. This is an optional attribute that filters out blasphemous or obscene language expressions if set to True.

This is an optional but important attribute that provides hints to the recognition model for more accurate transcription. By sending Object speech contexts in this object, the potential (SpeechContext) errors introduced due to phonetically similar words get eliminated, resulting in more accurate recognition based on the context of the speech. This is an optional field. If set to true, each word in the transcribed speech is tagged with boolean its start and end time within the audio signal. By default, the value of this parameter is false.

boolean

This optional field only affects selected languages and is currently available as an experimental field. When enabled, the transcription includes punctuation text. By default, the parameter is set to false.

object

This is an optional field that provides metadata about the audio signal. The field is of the RecognitionMetadata type and contains the following sub-fields: interactionType / industryNaicsCodeOfAudio / microphoneDistance / originalMediaType / recordingDeviceType / recordingDeviceName / originalMimeType / obfuscatedId / audioTopic The metadata provides additional hints to the model, which are used for more accurate and context-sensitive transcriptions.

[ 106 ]

Using Cloud AutoML

Chapter 4 This is an optional field that selects an appropriate machine learning model for preforming speech-to-text recognition. The selection of appropriate models for recognition greatly enhances the accuracy of the transcription. If the model is not set, the runtime model is selected based on the parameters in RecognitionConfig. Here is a list of currently available models on GCP: •

model

string

command_and_search: A model best suited

for voice commands or voice search •

phone_call: A model best suited for

transcription of telephonic conversations •

video: A model best suited for the audio

signal extracted out of original video data •

default: A model when a specific

useEnhanced

boolean

conversation model is not specified or available This is an optional parameter that is set to true for using enhanced models. The enhanced models are more accurate and cost more than the regular models.

The audio content can be embedded into the request body by sending a content parameter within the request's audio field. The embedded content can be made part of the gRPC or REST request. In the case of a gRPC request, the audio must be compatible with Protocol Buffers Version 3 Language Specification and embedded as binary data. When the embedded audio is sent within a REST request, the audio needs to be JSON serialized and Base64 encoded. Here is an example of sending Base64-encoded audio content within the request body: { "config": { "encoding":"WAV", "sampleRateHertz":18000, "languageCode":"en-US" }, "audio": { "content": "XtxhQwJJJCIQABQLLTUJABtAA+gA8AB+W8FZndQvQAyjv..." } }

[ 107 ]

Using Cloud AutoML

Chapter 4

Here is the Python code for Base64-encoded audio data: import base64 def encode_audio(audio): audio_content = audio.read() return base64.b64encode(audio_content)

Alternatively, the reference to audio content can be sent in the request by pointing to the content with a URI. The pointed audio should be in the raw binary format instead of the Base64-encoded format. A sample request with URI pointing to an audio file on Google Cloud Storage is as follows: { "config": { "encoding":"WAV", "sampleRateHertz":18000, "languageCode":"en-US" }, "audio": { "uri": "gs://bucket-name/path_to_audio_file" } }

The audio file pointed to by the URI should either be publicly accessible over the internet and/or reachable by the service account on GCP. A synchronous request provides a response in time proportional to the length of the audio content. The response is received in the following format: { "name": "8214202757943088943", "metadata": { "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata", "progressPercent": 100, "startTime": "2019-09-22T08:16:32.013650Z", "lastUpdateTime": "2019-09-22T08:16:55.278630Z" }, "done": true, "response": { "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse", "results": [ { "alternatives": [ { "transcript": "One Two Three Four",

[ 108 ]

Using Cloud AutoML

Chapter 4

"confidence": 0.97186122, "words": [ { "startTime": "1.300s", "endTime": "1.400s", "word": "One" }, { "startTime": "1.400s", "endTime": "1.600s", "word": "two" }, { "startTime": "1.600s", "endTime": "1.600s", "word": "three" }, { "startTime": "1.600s", "endTime": "1.900s", "word": "four" }, ... ] } ] }, { "alternatives": [ { "transcript": "one too thee four", "confidence": 0.9041967, } ] } ] } }

The response contains two distinct sections: Header Information: This is the header-level information regarding the speechto-text transcription process. The header contains the following fields: name: This is a unique name assigned by the Speech-to-Text API.

[ 109 ]

Using Cloud AutoML

Chapter 4 metadata: This includes the following: @type: This field represents the type of response as

defined by GCP. The field points to the object definition URI. progressPercent: This field represents the percentage of completion of the transcription. startTime: This is the start time of the speech-totext translation. LastUpdateTime: This is the time when the status was last updated by the API.

Response: This contains the following fields: @type: This field represents the type of the response body as defined by GCP. The field points to the object definition URI (type.googleapis.com/google.cloud.speech.v1.LongRunn ingRecognizeResponse). results: This is a collection object, which is a sequential list of speech-to-text conversion units based on sequential sections of the audio input. alternatives: Each individual sequential result contains one or more alternative transcriptions with varying levels of confidence. The alternatives are sorted in descending order of confidence. Typically, the first alternative is the best, which can be used by the applications. The number of alternatives in the response is controlled by the maxAlternatives request parameter. transcript: Each alternative presents a transcript for the section of audio. confidence: This is a numeric field that indicates the confidence of the model with the transcript. words: Within each alternative transcript, there are multiple words. Optionally, these words are presented on a timeline (based on the value of the enableWordTimeOffsets request parameter): startTime: Denotes the start time within the audio signal for the output word endTime: Denotes the end time within the audio signal for the output word word: Actual transcription

[ 110 ]

Using Cloud AutoML

Chapter 4

For synchronous transcription, the request is sent to the following endpoint: POST https://speech.googleapis.com/v1/speech:recognize

This is a simple HTTP POST message that returns the speech recognition response in JSON format.

Asynchronous requests Asynchronous or non-blocking requests are similar in terms of request body and the headers. However, the API method call is different when the requests are intended to be asynchronous. The LongRunningRecognize method is called. The response is not returned immediately. Instead of the response containing the transcription, the request to LongRunningRecognize will return a pulse check message immediately on calling the API. The response format is as follows: { "name": "operation_name", "metadata": { "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata" "progressPercent": 37, "startTime": "2019-09-03T13:24:39.579144Z", "lastUpdateTime": "2019-09-03T13:24:39.826903Z" } }

The process will continue to transcribe the audio signal sent in the request. Once the transcription is completed, a response is sent with progressPercent as 100. At that point, the format of the response is the same as the synchronous response format.

Streaming requests Based on the requirement of the application, it may be required to have a two-way streaming capability for the real-time continuous transcription of audio signals. The Speech-to-Text API provides a method for real-time transcription within a bi-directional stream. The sender application can send a continuous stream of an audio signal to the API and receive a discrete, as well as a complete, form of transcription from the service. Just-intime results represent the current audio in the transcribed format and the final response gets the entire transcription (similar to synchronous and asynchronous responses).

[ 111 ]

Using Cloud AutoML

Chapter 4

In terms of the API, the streaming requests are sent to the StreamingRecognize method as the endpoint. Since the API is a continuous, streaming API, multiple requests are sent to the API with a different audio window. However, the first message must contain the configuration of the streaming request. The configuration is defined by the StreamingRecognitionConfig object, which provides a hint to the API to process the specific streaming audio signal. The StreamingRecognitionConfig object is defined as follows: config: This is the RecognitionConfig object that we discussed earlier in this

chapter. single_utterance: This is an optional boolean flag. When this is set to false, the streaming recognition API continues to transcribe the input signal despite a long pause within the speech. The stream remains open until it is explicitly closed by the calling process or until a certain time threshold has elapsed. In this case, the API may return multiple StreamingRecognitionResult objects. If this flag is set to true, the model detects a pause within the audio signal and the API returns an END_OF_SIGNAL_UTTERANCE event and completes the recognition process. In this case, the API will return only one occurrence of StreamingRecognitionResult. interim_results: This is an optional flag. If set to true, the interim results are returned by the API, and if set to false, the API returns results only once the transcription is completed. The API returns a response in the form of a StreamingRecognizeResponse message object. This is the only object returned by the streaming Speech-to-Text service API. The response object contains the following significant fields: speechEventType: This represents a pause in the audio conversation as

detected by the underlying model. There are two event types recognized by the API. SPEECH_EVENT_UNSPECIFIED indicates that no event is specified and END_OF_SINGLE_UTTERANCE indicates that the model has detected a pause within the audio signal and the API does not expect any additional audio data in the stream. This event is sent only when the single_utterance request parameter is set to true. results: This is the main wrapper object that contains the transcription results as a collection: alternatives: Similar to synchronous and asynchronous transcription requests, this collection provides various transcription alternatives with varying confidence levels.

[ 112 ]

Using Cloud AutoML

Chapter 4 isFinal: This flag is set to true when the entire audio signal is

transcribed by the model. stability: In the context of streaming speech recognition, overlapping parts of the speech get transcribed over a moving time window. That means a particular position within the audio signal may be transcribed more than once within subsequent frames. The speech-to-text model generates a stability score that indicates the possibility of change in the transcription. A score of 0 indicates an unstable transcription that will eventually change and a score of 1 indicates that there will not be any change from the original transcription.

Sentiment analysis using AutoML Natural Language APIs Sentiment analysis is one of the key practical capabilities in building intelligent platforms that recommend business actions based on the end user's perception of a service or subject. The process is highly complex considering the variations in the context and nature of the data. Typically, in most cases, the data is unstructured or semi-structured, which makes it difficult to achieve a high level of accuracy. As a generic process, sentiment analysis can be broken down into the following steps:

[ 113 ]

Using Cloud AutoML

Chapter 4

Goal Setting: The goal for sentiment analysis needs to be clearly defined within the context of the use case. For example, when buyer sentiment is to be analyzed, we need to center it around products and their features. Text Processing: Based on the contextual goal and sources of content, the next step is to perform text processing, which involves the elimination of noise and unwanted words and organizing emotional content along with slang. Parsing the Contents: At this stage, the content is parsed and grouped based on logical and meaningful connotation. The content is segmented based on polarity and semantics. Text Refinement: The stop words and synonyms need to be identified and aligned within this stage. Analysis and Scoring: In this stage, the parsed and refined content is scored based on training data and the semantic meaning of the text. The scoring is used in training the model for actual sentiment analysis for the new content. AutoML Natural Language sentiment analysis creates an abstraction layer and relieves the application development process from core complexities of sentiment analysis. Once the training data is carefully prepared with appropriate scoring, the platform takes care of the training, evaluating, and deploying of the model with a simple web interface as well as APIs. In this section, we will understand the process of performing sentiment analysis on GCP using AutoML. The most important and time-consuming step in the process is creating the training data for Natural Language sentiment analysis with GCP AutoML. Similar to any other supervised learning model training process, we need to provide the sample data, which labels the text content on the integer scale. Typically, the sentiment score starts from 0 and can be as granular as required. The higher the integer label, the more positive the sentiment. The maximum sentiment score can be defined based on the intended granularity. As a general guideline, the sentiment scores must start from 0 and should not have any gaps in the training data for accurate model training. It is recommended to have an even number of samples with each individual score within the training data. This will avoid underfitting the model. AutoML accepts training data in CSV format with three fields in each row. The first field indicates the row either as a training, validation, or test sample. This field is optional within the dataset. If it is not provided in the data file, AutoML creates a split automatically and allocates approximately 80% of the samples to training and 10% each for validation and testing.

[ 114 ]

Using Cloud AutoML

Chapter 4

The second field is the placeholder for the actual content that needs to be analyzed by the model. The third field indicates the actual score that represents the sentiment, which starts with 0 (most negative) and can have a maximum value of 10 (most positive): 1. Let's experiment with AutoML Sentiment Analysis with a sample dataset that is labeled Twitter data. Launch AutoML Sentiment Analysis from the navigation menu > Artificial Intelligence > Natural Language:

2. Once the application is launched, click on the New Dataset button on the top menu bar 3. Provide a Dataset name and select the Sentiment analysis option as shown in the following screenshot:

[ 115 ]

Using Cloud AutoML

Chapter 4

There are four options present for uploading the text content: Upload a CSV file from your computer: The CSV file can be the text file with the actual data or a list of GCS paths. Upload text items from your computer. Select a CSV file on Cloud Storage. Import text items later: The dataset can be created by creating the set of text items and labeling them directly in the workspace. Once the dataset is loaded, AutoML provides the statistics about the labeled data on the console:

4. At this stage, the model is ready to be trained. Click on the TRAIN tab and click on the Start Training button. AutoML will split the training data based on a user-defined split or the default split percentages. Once the model is trained, it is evaluated based on the evaluation sample and the detailed model performance analysis is available on the EVALUATE tab. The model is automatically deployed on the platform and can be used for performing sentiment analysis on the new set of data via the web interface or API. Here is the Python code snippet for using the deployed model to perform sentiment analysis: project_id = 'ai-gcp-ch4' compute_region = 'us-central1'

[ 116 ]

Using Cloud AutoML

Chapter 4

model_id = '[MODEL_ID]' file_path = '/local/path/to/file' from google.cloud import automl_v1beta1 as automl automl_client = automl.AutoMlClient() prediction_client = automl.PredictionServiceClient() model_full_id = automl_client.model_path( project_id, compute_region, model_id ) with open(file_path, "rb") as content_file: snippet = content_file.read() payload = {"text_snippet": {"content": snippet, "mime_type": "text/plain"}} params = {} response = prediction_client.predict(model_full_id, payload, params) print("Prediction results:") for result in response.payload: print("Predicted sentiment label: {}".format(result.text_sentiment.sentiment)) for key, data in response.metadata.items(): if key == 'sentiment_score': print("Normalized sentiment score: {}".format(data))

This code can be used for calling the API through an application boundary and needs authentication keys to access the service.

Summary In this chapter, we have seen how GCP makes it easy to build, deploy, and use machine learning models using the web interface and APIs. We have demonstrated the ease of use and scalability of the platform based on some of the most common use cases. The API layer can be exposed in a secure manner and the models can be constantly upgraded based on the new and labeled datasets. In the next chapter, we will see how to build a cloud-based machine learning engine and walk through the step-by-step process of the application of machine learning as a service in production environments.

[ 117 ]

5 Building a Big Data Cloud Machine Learning Engine Machine learning (ML) has revolutionized technology realms through the implementation of Artificial Intelligence (AI). In the fields of customer support, fraud detection, and business intelligence, ML have continuously become the favorites of enterprises. The importance of ML is influencing the cloud computing sector as well. Therefore, every cloud provider, including Google, is playing a large role in revolutionizing AI services on their platforms. They have invested heavily in the past years, from the development of new services through major reorganizations that strategically locate AI in their organizational structure, as well as its development. In contrast to other cloud-based native services, ML and AI platforms on the cloud have various delivery models, such as speech reorganization, video analytics, other forms of cognitive computing, ML automation, ML model management, ML model serving, and GPU-based computing. In this chapter, we will investigate various elements of ML, including Google Cloud ML and how Google Cloud's Machine Learning Engine is used. Cloud Machine Learning Engine is also known as the Cloud AI Platform. So, let's start with an understanding of ML in the cloud. This chapter will cover the following topics: Understanding ML Understanding how to use Cloud Machine Learning Engine Overview of the Keras framework Training your model using the Keras framework Training your model using the Google AI Platform Asynchronous batch prediction using Cloud Machine Learning Engine Real-time prediction using Cloud Machine Learning Engine

Building a Big Data Cloud Machine Learning Engine

Chapter 5

Understanding ML For a few years now, there has been talk of ML as if it promises enormous advantages that can affect every aspect of human existence. Efforts have also been taken to develop ML to the point that it does not require human participation. The bright future of ML is seen in the phase of AI, which utilizes ML models and gets them to learn from observations. Enormous quantities of processing capability and storage were required to properly evaluate all the appropriate information for accurate ML results in an AI system. Companies who wished to use ML technologies for predictive analytics had to make significant savings regarding software and hardware. This is no longer the case as the cloud has changed everything. The cloud eliminates the need to completely own physical hardware and software during the present era of large data. Intelligent analysis of data is within easy reach for many companies as these complicated systems can be rented in the cloud. Anyone can use this technology to the fullest extent possible, as long as they have access to the cloud for ML. To build a ML solution, you don't need a cloud provider. There are readily available open source learning frameworks such as TensorFlow, MXNet, and CNTK that allow businesses to operate on their hardware and help them build ML models. However, advanced ML models developed in-house run into accuracy issues. This is because training on real-world big data models usually involves big clusters with huge compute and storage capacities. The obstacles to entry are high on many fronts when it comes to bringing ML skills into business apps. The specialized knowledge that's needed to construct, train, and implement ML models and the computing and hardware specifications are associated with greater labor, production, and infrastructure expenses. These are issues that can be solved by cloud computing as major public cloud platforms are designed to enable businesses to leverage ML skills to solve business problems without any technological burdens. To summarize, the following are some of the advantages of using ML on the cloud: The pay-per-use model of the cloud is useful for heavy workloads of AI or ML due to short-term requirements for the necessary hardware. The cloud makes it easy for companies to experiment with the capacities of ML in terms of compute instances or storage space and to expand as performance and demand targets increase. Easy availability of specialized equipment (GPU/TPU) that could assist with speeding up AI growth can be found in the cloud. The cloud provides smart services without any requirements for the user to have sophisticated abilities in AI or data science.

[ 119 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

To develop an AI framework, you must use cloud providers, starting from the hardware, software, and all the other types of required frameworks. You are also allowed to deploy certain cost control strategies yourself in the cloud. For example, by just using the hardware alone and developing a custom solution using open source software, you can save money for your organization. In the next section, we will specifically look into the AI platform offering of Google Cloud Platform and how to use it. We will also briefly touch on some of the ML services provided by Google Cloud.

Understanding how to use Cloud Machine Learning Engine Cloud Machine Learning Engine is a Google Cloud-managed service that allows developers and information researchers to construct, operate, and produce better models for ML. Cloud ML Engine (or AI Platform) provides training and prediction services that can be used separately or together. Training and prediction services are now called AI Platform Training and AI Platform Prediction within ML Engine. The following diagram represents the Google Cloud AI Platform:

[ 120 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

Cloud AI Platform is used in three steps. As shown in the preceding diagram, you can build your projects using Google AI Platform Notebooks, by using Google Cloud Machine Learning model training, and by using Google Cloud AI Platform Prediction services. In this chapter, we'll take a look at leveraging Google AI Platform Notebooks. We covered Google Cloud Machine Learning model training and Prediction services in Chapter 8, Implementing TensorFlow Models Using Cloud ML Engine, and Chapter 9, Building Prediction Applications, respectively.

Google Cloud AI Platform Notebooks AI Platform Notebooks is a managed service that offers an embedded JupyterLab environment in which designers and information researchers can build JupyterLab cases with one single click of the recent information studies and ML frames. BigQuery, Cloud Dataproc, and Cloud Dataflow are integrated within the notebooks. This makes processing and preprocessing information simple to implement. This ultimately leads to modeling, training, and implementation simple in terms of information intake to exploration. New JupyterLab cases can be deployed with one click and your information can be analyzed instantly. Every example is pre-configured with optimized variants of the most common data science and ML libraries. The JupyterLab interface utilizes Notebooks and is preinstalled with optimized library variants, such as TensorFlow, PyTorch, sci-kit-learn, pandas, SciPy, and Matplotlib. By adding CPU, RAM, and GPU, you can start small and then scale. If your information is too large for one computer, seamlessly move to services such as BigQuery, Cloud Dataproc, Cloud Dataflow, and AI Platform prediction training.

Google AI Platform deep learning images Google AI Platform Notebooks saves you the hassle of creating compute instances that are specialized to the needs of running deep learning algorithms. Google Cloud AI Platform provides deep learning virtual machines that offer verified, optimized, and tested operating system images to save you the trouble of constructing and configuring compute instances for deep learning algorithms. The AI Deep Learning VM Images platform is a series of VM images that have been optimized for data science and ML from the Debian9-based Compute Engine. All images come with pre-installed main ML frameworks and instruments and can be used outside the box on GPU cases to speed up how information is processed. The Google Cloud AI Platform's Deep Learning VM Images is a collection of pre-packaged virtual machine pictures that provide a profound, ready-to-run ML platform structure.

[ 121 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

The Deep Learning VM pictures help you build an atmosphere for profound learning models through pre-configuring dependence, pre-installation of vital instruments, and performance optimization. Regarding Google Platform Deep Learning Images, there are three important concepts that you need to be aware of. Let's take a look. The first concept is called image family. An image family is a series of images that have been pre-configured for a certain purpose or by a particular architecture. Google cloud provides various Deep Learning VM image families. Each image family is specific to a certain ML framework. Some of the examples of image families that are provided by Google cloud are as follows: TensorFlow family, with or without GPU. PyTorch family, with or without GPU. Chainer experimental family, with or without GPU. A base or common image family that you can add your preferred framework to, with or without GPU. The second concept is called images. An image is a single blueprint for a virtual machine. Deep Learning VM pictures are virtual machine pictures that have been pre-configured by the public compute engine. There are two types of image types, as follows: Custom images: Only your project can view custom images. These images are tailored specifically to the needs of the project you are working on. A custom picture from the boot drives and other pictures can be created. Then, you can use this personalized picture to generate an example. Public images: Google, open source communities, and third-party suppliers provide and maintain public images. By default, all projects can access and use these images to create instances. You can use most public images at no extra price, but you can also add some premium images to your project. You don't get charged for custom images of Compute Engines, but you do incur a storage charge while maintaining the custom image in your project. The third concept is called instances. An instance is a virtual machine that hosts the infrastructure of Google. One of the pre-configured Deep Learning VM images is based on a Deep Learning VM instance. Using the Google Cloud Platform Console or the commandline tool, you can generate an instance with an image. Deep Learning images are always the first steps toward using Google AI Platform Notebooks. Without selecting these Deep Learning images, you wouldn't be able to launch these notebooks. In the next section, we will look at how to launch and run these notebooks.

[ 122 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

Creating Google Platform AI Notebooks Let's start by creating a Google Platform AI Notebook, as follows: 1. Create a new instance. The following screenshot shows you how to do this from the Google cloud console's UI:

2. Next, you have to choose available custom or public images that are going to be used in your AI Notebook's compute instances:

3. After selecting a virtual machine image, you will be presented with the following screen, which contains some important instance information. You should review it carefully. If you need to change anything, you can click on the CUSTOMIZE option, as depicted:

[ 123 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

4. The customize screen provides options for changing RAM, regions, zones, and so on, as shown in the following screenshot:

[ 124 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

5. You can also have additional customization available around GPUs and boot disks, as shown in the following screenshot:

6. After the Notebook has been created, it will appear in the list of available notebooks in the Google cloud AI Platform UI, as shown in the following screenshot:

[ 125 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

7. Run the following command to access the JupyterLab user interface on your created VM instance. The following command sets up port forwarding: export INSTANCE_NAME="" gcloud compute ssh $INSTANCE_NAME -- -L 8080:localhost:8080

8. The following screenshot shows what the UI looks like. You have to type http://localhost:8080/lab? into your web browser to get this:

You can also automate this with the following shell script: #!/bin/bash IMAGE=--image-family=tf-latest-cpu INSTANCE_NAME=dlvm GCP_LOGIN_NAME= # CHANGE THIS gcloud config set compute/zone us-central1-a # CHANGE THIS echo "Launching $INSTANCE_NAME" gcloud compute instances create ${INSTANCE_NAME} \ --machine-type=n1-standard-2 \ -scopes=https://www.googleapis.com/auth/cloud-platform,https://www.g oogleapis.com/auth/userinfo.email \ ${IMAGE} \

[ 126 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

--image-project=deeplearning-platform-release \ --boot-disk-device-name=${INSTANCE_NAME} \ --metadata="proxy-user-mail=${GCP_LOGIN_NAME}" echo "Looking for Jupyter URL on $INSTANCE_NAME" while true; do proxy=$(gcloud compute instances describe ${INSTANCE_NAME} 2> /dev/null | grep dot-datalab-vm) if [ -z "$proxy" ] then echo -n "." sleep 1 else echo "done!" echo "$proxy" break fi done

Here is the pseudocode for this automation script: Define the virtual machine image type Provide the Google Cloud Platform credentials Set the compute zone where the computation behind the notebook will take place Define the instance size Iterate through the compute instances and look for the Jupyter Notebook URL on the instance In the next section, we will explore AI Notebooks and learn how to utilize them for specific tasks.

Using Google Platform AI Notebooks You can set up local notebooks using the AI Notebooks JupyterLab platform or you can clone from the Git repository. In the UI, click the Git icon on the top menu to clone a repository, as shown in the following screenshot:

[ 127 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

As we can see, Google Cloud Platform allows us to select various versions of Python, as well as a way to create specific types of supporting files:

Jupyter Notebooks allows a command-line interface to pull code from the Git repository. You can view this code by using the file browser in the left pane. This is a handy feature and makes it easy and seamless for multiple developers to collaborate.

[ 128 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

Automating AI Notebooks execution Once you've developed the code and are able to make it work based on runtime configuration parameters, you can execute and evaluate these notebooks using a library called Papermill. Let's go over what Papermill can do for us: This library allows you to spawn various notebooks and run them simultaneously. Papermill can also assist you with gathering and summarizing metrics in a notebook collection. It also allows you to read or write information from various places. This means you can store your Output Notebook on another storage system that offers greater durability and easier access to a reliable pipeline. At the time of writing, Papermill recently added Google Cloud buckets assistance. We'll demonstrate how to use this new feature in this chapter. The following diagram represents how Papermill can be used with Google Cloud AI Notebooks:

[ 129 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

Papermill allows you to alter the paradigm of how the notebook materials work. We receive a functional characteristic that is added to our job definition since Papermill does not change the source notebook, which is usually lacking in the laptop room. Our input, which is a JSON notebook document, and our input parameters are regarded as unchanging execution documents that generate an unchanging output document. This output document offers the executed code and the outputs and logs, as well as a repeatable template that can be readily reworked at any time. Papermill's capacity to read or write from several locations is another characteristic it offers. To provide a reliable pipeline, we can store our output notebook somewhere with elevated durability and simple access. Head to Google Cloud bucket in order to store your output notebooks. This makes output notebooks isolated documents so that our users are supported as much as possible. This makes it simple for us to analyze operations such as connecting to services or the storage prefix for Google Cloud. Users can use these connections and debug problems, verify results, and generate fresh templates, without affecting the initial workflows. Furthermore, you do not need a notebook or another infrastructure to execute against notebook kernels because Papermill manages its runtime procedures. This eliminates some of the complexities that we perform in a simplified context with hosted Notebook services. Check out the following diagram for a deeper look into the Papermill library:

[ 130 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

The following code is the normal way of creating a Deep Learning VM. However, remember that you have to choose the VM that contains the key dependencies you need in order to run your notebook. If your notebook requires PyTorch or vice versa, don't attempt to use a TensorFlow picture: # Compute Engine Instance parameters export IMAGE_FAMILY="tf-latest-cu100" export ZONE="us-central1-b" export INSTANCE_NAME="notebook-executor" export INSTANCE_TYPE="n1-standard-8" # Notebook parameters export INPUT_NOTEBOOK_PATH="gs://my-bucket/input.ipynb" export OUTPUT_NOTEBOOK_PATH="gs://my-bucket/output.ipynb" export PARAMETERS_FILE="params.yaml" # Optional export PARAMETERS="-p batch_size 128 -p epochs 40" # Optional export STARTUP_SCRIPT="Papermill ${INPUT_NOTEBOOK_PATH} ${OUTPUT_NOTEBOOK_PATH} -y ${PARAMETERS_FILE} ${PARAMETERS}" gcloud compute instances create $INSTANCE_NAME \ --zone=$ZONE \ --image-family=$IMAGE_FAMILY \ --image-project=deeplearning-platform-release \ --maintenance-policy=TERMINATE \ --accelerator='type=nvidia-tesla-t4,count=2' \ --machine-type=$INSTANCE_TYPE \ --boot-disk-size=100GB \ --scopes=https://www.googleapis.com/auth/cloud-platform \ --metadata="install-nvidia-driver=True,startupscript=${STARTUP_SCRIPT}" gcloud --quiet compute instances delete $INSTANCE_NAME --zone $ZONE

The parameters that were used in the preceding code are as follows: INPUT_NOTEBOOK_PATH: The input notebook located on the Cloud Storage bucket; for example, gs://my-bucket/input.ipynb. OUTPUT_NOTEBOOK_PATH: The output notebook located on the Cloud Storage bucket; for example, gs://my-bucket/input.ipynb. PARAMETERS_FILE: Users can provide a YAML file where notebook parameter values should be read; for example, gs://my-bucket/params.yaml. PARAMETERS: Users can pass parameters via the -p key value for notebook execution; for example, -p batch_size 128 -p epochs 40.

[ 131 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

The following is an example GPU instance example where we're selecting the *cu-100 image family. Depending on the specific requirements and the availability of the version, a different configuration can be used. Additionally, we need to provide a path for input and output notebooks. Pass all these parameters to the gcloud command to create the compute instances: IMAGE_FAMILY="tf-latest-cu100" # Or use any required DLVM image. ZONE="us-central1-b" INSTANCE_NAME="notebook-executor" INSTANCE_TYPE="n1-standard-8" INPUT_NOTEBOOK_PATH=$1 OUTPUT_NOTEBOOK_PATH=$2 GPU_TYPE=$3 GPU_COUNT=$4 STARTUP_SCRIPT="Papermill ${INPUT_NOTEBOOK_PATH} ${OUTPUT_NOTEBOOK_PATH}" # Create DLVM gcloud compute instances create $INSTANCE_NAME \ --zone=$ZONE \ --image-family=$IMAGE_FAMILY \ --image-project=deeplearning-platform-release \ --maintenance-policy=TERMINATE \ --accelerator="type=nvidia-tesla-${GPU_TYPE},count=${GPU_COUNT}" \ --machine-type=$INSTANCE_TYPE \ --boot-disk-size=100GB \ --scopes=https://www.googleapis.com/auth/cloud-platform \ -metadata="install-nvidia-driver=True,startup-script=${STARTUP_SCRIPT}" gcloud --quiet compute instances delete $INSTANCE_NAME --zone $ZONE

The following is a CPU instance example. It's similar to procuring a VM with GPU. We need to select the appropriate image family, that is, *-cpu, and provide similar parameters to spin up the instance with the gcloud command: IMAGE_FAMILY="tf-latest-cpu" # Or use any required DLVM image. ZONE="us-central1-b" INSTANCE_NAME="notebook-executor" INSTANCE_TYPE="n1-standard-8" INPUT_NOTEBOOK_PATH=$1 OUTPUT_NOTEBOOK_PATH=$2 STARTUP_SCRIPT="Papermill ${INPUT_NOTEBOOK_PATH} ${OUTPUT_NOTEBOOK_PATH}" # Create DLVM gcloud compute instances create $INSTANCE_NAME \ --zone=$ZONE \ --image-family=$IMAGE_FAMILY \ --image-project=deeplearning-platform-release \

[ 132 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

--machine-type=$INSTANCE_TYPE \ --boot-disk-size=100GB \ --scopes=https://www.googleapis.com/auth/cloud-platform \ --metadata="startup-script=${STARTUP_SCRIPT}" gcloud --quiet compute instances delete $INSTANCE_NAME --zone $ZONE

The startup script does the following: 1. Creates a Compute Engine instance using the TensorFlow Deep Learning VM and two NVIDIA Tesla T4 GPUs. 2. Installs NVIDIA GPU drivers. 3. Executes the notebook using the Papermill tool. 4. Uploads the notebook's result (with all the cells pre-computed) to the Cloud Storage bucket, which in this case is gs://my-bucket/. 5. Papermill emits a save after each cell executes. This could generate 429 Too Many Requests errors, which are handled by the library itself. 6. Terminates the Compute Engine instance. Take a look at the following link if you want to look at the complete code for the startup script: https:/​/​shorturl.​at/​tBJ12.

Hopefully, you now have a good understanding of the Papermill library. You need to fulfill the following steps in order to complete the process of using AI Notebooks in a production environment. We have covered these steps in discreet waves here, but it is always good to show them in the consolidated list, too: 1. Develop code using Google AI Notebooks 2. Schedule and automate Deep Learning VM image provisioning 3. Schedule and automate Jupyter Notebooks

[ 133 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

Overview of the Keras framework Keras is a deep learning framework for Python that can help us identify and train virtually any type of deep learning model. For scientists, Keras was created to allow rapid experimentation. It is an open source Python library that runs on TensorFlow or Theano. Let's take a look at Keras and its features: It is modular, quick, and simple to use. It was designed by Google engineer François Chollet. Low-level computation isn't handled by Keras. Instead, they use a different library, named the backend, to carry out their work. Therefore, Keras is a highlevel API wrapper with a low-level API that can run on top of TensorFlow, CNTK, or Theano. Keras' high-level API handles how we create models, define levels, or set up various input-output models. It allows the same code to operate seamlessly on CPU or GPU. Keras has some main important characteristics. It has an easy to use API to prototype deep learning models rapidly. It supports convolutional (PC vision) networks, recurrent (sequence) networks, and any combination of them. It contains various arbitrary network architectures. Multiple-input or multioutput, layer-sharing, model-sharing, and so on are supported. This includes multiple models. Keras is essentially suitable for constructing deep learning models, from a generative opponent network to a Turing machine. Keras is a model library that provides high-level construction blocks for the development of profound education systems. It does not deal with low-level activities such as manipulation and differentiation of the tensor. Instead, it depends on a dedicated tensor library that is welloptimized and serves as Keras' backbone motor.

[ 134 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

Instead of selecting a single tensor library, Keras manages this issue in a modular manner and ties Keras to this library; several backend motors can be plugged into Keras seamlessly. The TensorFlow backend, Theano backend, and the Microsoft Cognitive Toolkit (CNTK) backend are currently the three current backend implementations. Keras will be able to work with even more in-depth learning motors in the future. Keras was built to operate with Python so that it's user-friendly, modular, and simple to extend. The API is conceived for people, not machines, and follows best practices for cognitive load reduction. All of these standalone modules can be combined to produce fresh models, neural layers, cost functionality, optimizers, initialization schemes, activation features, and regularization systems. With fresh classes and features, it's easy to add new models. Models in Python code are described, not distinct files for model setup. The main reasons for using Keras come from its guidelines, mainly that of it being userfriendly. Keras provides benefits such as broad acceptance, support for a broad spectrum of manufacturing deployments, inclusion in at least five backing motors (TensorFlow, CNTK, Theano, MXNet, and PlaidML), and powerful support for various GPUs and distributed training, over and above its ease of teaching and ease of model development. Moreover, Google, Microsoft, Amazon, Apple, Nvidia, Uber, and others support Keras. Keras itself does not operate at low levels like tensor products and convolutions do. Instead, it depends on a backend motor. While Keras supports several backend motors, it uses TensorFlow as the main (and default) backend and Google as the main fan. The Keras API is packaged as tf.keras in TensorFlow and is now the main TensorFlow API from TensorFlow 2.0, as we mentioned previously. Here is a basic example of how to use the Keras framework for image classification using the MNIST dataset: import keras keras.__version__ from keras.datasets import mnist (train_images, train_labels), (test_images, test_labels) = mnist.load_data() train_images.shape len(train_labels) train_labels test_images.shape

[ 135 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

len(test_labels) test_labels from keras import models from keras import layers network = models.Sequential() network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,))) network.add(layers.Dense(10, activation='softmax')) network.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy']) train_images = train_images.reshape((60000, 28 * 28)) train_images = train_images.astype('float32') / 255 test_images = test_images.reshape((10000, 28 * 28)) test_images = test_images.astype('float32') / 255 from keras.utils import to_categorical train_labels = to_categorical(train_labels) test_labels = to_categorical(test_labels) network.fit(train_images, train_labels, epochs=5, batch_size=128) test_loss, test_acc = network.evaluate(test_images, test_labels) print('test_acc:', test_acc)

In the preceding code, we are loading the images dataset and training the images with the Keras library by reshaping the images into a pixel matrix. The neural network is trained in batches of 128 with five iterations (epochs) through the neural network. The train and test labels are converted into categorical variables before they're fed to the neural network. The network.fit method trains the neural network for the training dataset, while the network.evaluate method is used for evaluating the model's accuracy parameters. Finally, the accuracy parameters are printed for analysis. If the model is found to have an accuracy above the threshold, it is used for running predictions on the new dataset.

[ 136 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

Here is the output of the preceding code:

As we can see, there are five iterations through the neural network and with each iteration, we get more accuracy over the evaluation dataset.

Training your model using the Keras framework In this section, we will take a look at another code sample for training neural networks with the Keras framework. To run this code, we need to have access to the following libraries: os: This provides miscellaneous operating system-dependent functionality. glob: This module is handy for working with the UNIX shell parameters. numpy: This library is used for fundamental mathematical functions.

The following link provides the necessary code for training neural networks with Keras: https:/​/​shorturl.​at/​pwGIX.

[ 137 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

The code referenced in the preceding link loads the training data from disk and splits it into training and evaluation sets. The model's structure is initiated as a Keras sequential model and various layers are added to the network before we feed training data to it. In the end, the model.fit method is used to train the data in batches of 256 through five iterations (epochs). The following screenshot shows what the output would look like for different code segments:

As we can see, the model has classified the image as a bicycle based on the training data images that were used for training.

[ 138 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

Now, let's take a look at the structure of the neural network and how the output changes through each iteration through the neural network. The output shape defines the neural network structure and we can configure it based on the evaluation results by passing parameter values to the Keras library:

Now, let's take a look at the output from training the model with the Keras library with a TensorFlow background. We have configured the training so that it's processed through 5 iterations (epochs). As we can see, the model's accuracy increases with each iteration. Once we have trained the model enough so that it's accuracy is above the set threshold, we can use that model to classify new records:

Now, we can use this model to predict the new data point. This prediction will output the class based on the training data provided:

[ 139 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

At this point, we have completed the model training, evaluation, and prediction steps with Keras. We can also visualize the model's performance across various epochs and how the loss function is optimized by the model using the matplot library:

[ 140 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

The preceding screenshot shows how the training and validation loss is minimized over training iterations of the model using Keras.

Training your model using Google AI Platform In the previous section, you learned how to train a model using the Keras framework. In this section, we will train the same model on Google Cloud AI Platform. The core code base will remain the same, but the way you train the model is going to differ. Let's get started: 1. The first thing you have to do is set up the directory structure of your code. Ensure to name your files as required by Google Cloud AI Platform. This will look something like this:

2. An additional piece of code needs to be added to the preceding code to ensure that it saves the model to Google Cloud bucket. This code is as follows: PROJECT_ID="" #change this BUCKET_ID="ml_assets" JOB_NAME = 'my_first_keras_job' JOB_DIR = 'gs://' + BUCKET_ID + '/keras-job-dir' REGION="" #change this ! gsutil ls -al gs://$BUCKET_ID print(JOB_DIR) export_path = tf.contrib.saved_model.save_keras_model(model, JOB_DIR + '/keras_export') print("Model exported to: ", export_path) #Verify model is created ! gsutil ls -al $JOB_DIR/keras_export

[ 141 ]

Building a Big Data Cloud Machine Learning Engine

The following screenshot shows the output of the preceding code:

3. Let's take a look at the buckets from the Google Cloud console window:

[ 142 ]

Chapter 5

Building a Big Data Cloud Machine Learning Engine

Chapter 5

4. Finally, you have to submit the training job in the form of the following gcloud AI platform command: #!/bin/sh PROJECT_ID="" #change this BUCKET_ID="ml_assets" JOB_NAME='my_keras_job' JOB_DIR='gs://$BUCKET_ID/keras-job-dir' REGION="us-central1" #change this gcloud config set project $PROJECT_ID gcloud ai-platform jobs submit training $JOB_NAME \ --package-path trainer/ \ --module-name trainer.task \ --region $REGION \ --python-version 3.5 \ --runtime-version 1.13 \ --job-dir $JOB_DIR \ --stream-logs

The output of the preceding command is as follows:

5. Additionally, we can visualize the running job with the Google Cloud console, as follows:

[ 143 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

The Google Cloud Platform stores the process logs for easy monitoring and troubleshooting. We can access these logs via Stackdrive Logging from the console window, as follows:

Now that our training job has been successful, we can utilize our saved ML model for online and offline prediction. We will look at this next.

[ 144 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

Asynchronous batch prediction using Cloud Machine Learning Engine To deliver online predictions using the model we trained and exported in the previous section, we have to create a model resource in the AI Platform and a version resource in it. The version resource is what your qualified model effectively utilizes to provide forecasts. With this framework, you can adjust, retrain, and manage all the versions in the AI platform many times. A model or version is an example of a master learning solution that has been stored in the Model Service of the AI Platform. You can make a release by using a standard model that has been trained (as a saved model). You can also provide custom code (beta) to handle forecasts when you create a version. Let's take a look: 1. The following command creates a model version in the Google cloud AI Platform: MODEL_NAME = "keras_model" ! gcloud ai-platform models create $MODEL_NAME \ --regions $REGION

The output of the preceding command is as follows:

2. The next model version is created using the following command: MODEL_VERSION = "v1" # Get a list of directories in the `keras_export` parent directory KERAS_EXPORT_DIRS = ! gsutil ls $JOB_DIR/keras_export/ # Pick the directory with the latest timestamp, in case you've trained # multiple times SAVED_MODEL_PATH = KERAS_EXPORT_DIRS[-1]

[ 145 ]

Building a Big Data Cloud Machine Learning Engine # Create model version based on that SavedModel directory ! gcloud ai-platform versions create $MODEL_VERSION \ --model $MODEL_NAME \ --runtime-version 1.13 \ --python-version 3.5 \ --framework tensorflow \ --origin $SAVED_MODEL_PATH

The following is the output from Google Cloud UI:

The following screenshot shows the version details of the Keras model:

[ 146 ]

Chapter 5

Building a Big Data Cloud Machine Learning Engine

Chapter 5

3. Now that you have created your prediction model version, you need to create the Python script for batch prediction. This starts with creating the Prediction Input JSON in the format shown in the following code: { "dataFormat": enum (DataFormat), "outputDataFormat": enum (DataFormat), "inputPaths": [ string ], "maxWorkerCount": string, "region": string, "runtimeVersion": string, "batchSize": string, "signatureName": string, // Union field model_version can be only one of the following: "modelName": string, "versionName": string, "uri": string // End of list of possible types for union field model_version. "outputPath": string }

Let's go over each of the parameters that were used in the preceding code: Data format: The format type you are using for your input files for prediction. All the input files must have the same information format for a specific task. It could be in JSON, TF_Record, or TF_Record_GZIP format. Output data format: The format type you are using for your output files for prediction. Input paths: The URIs of your input data files that need to be stored in Google Cloud storage. Output path: The place in the cloud where you want to save your outputs by providing forecast services. Your project needs to be allowed to write to this place. Model name and version name: The name of the model and the version that you wish to receive projections from. The default version of the model is used if you do not specify a version. You can use the undeployed SavedModel Cloud storage route, called the Model URI, if you like.

[ 147 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

Model URI: The name of the model and the version that you wish to receive projections from. The default version of the model is used if you do not specify a version. You can use the undeployed SavedModel Cloud storage route, called the Model URI, if you like. Region: The area where your work would be run by Google Compute Engine. To perform your forecast task and save input and output information for very broad information, everything needs to be set in the same region. Maximum worker count (optional): The highest number of forecast nodes to use for this task in the processing cluster. This is how you can set an upper limit to the automatic batch forecast scaling function. It defaults to 10 if you don't set a value. Runtime version (optional): The version of the AI Platform in use. This option is included to allow you to specify a runtime version to be used with AI Platform models. For the model versions that have been deployed, you should always omit this value to tell the service to use the same version you specified when deploying the model version. Signature name (optional): If your saved model has various signatures, you can select an alternative input/output map that's been identified by TensorFlow SavedModel to specify a custom TensorFlow signature name. 4. The following Python code represents how to build the JSON body: import time import re def make_batch_job_body(project_name, input_paths, output_path, model_name, region, data_format='JSON', version_name=None, max_worker_count=None, runtime_version=None): project_id = 'projects/{}'.format(project_name) model_id = '{}/models/{}'.format(project_id, model_name) if version_name: version_id = '{}/versions/{}'.format(model_id, version_name)

5. Make a jobName in the model_name_batch_predict_YYYYMMDD_HHMMSS format: timestamp = time.strftime('%Y%m%d_%H%M%S', time.gmtime()) # Make sure the project name is formatted correctly to work as the basis

[ 148 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

# of a valid job name. clean_project_name = re.sub(r'\W+', '_', project_name) job_id = '{}_{}_{}'.format(clean_project_name, model_name, timestamp)

6. Start building the request dictionary with the required information: body = {'jobId': job_id, 'predictionInput': { 'dataFormat': data_format, 'inputPaths': input_paths, 'outputPath': output_path, 'region': region}}

7. Use the version if it's present or the model (its default version) if not: if version_name: body['predictionInput']['versionName'] = version_id else: body['predictionInput']['modelName'] = model_id

8. Only include a maximum number of workers or a runtime version if specified. Otherwise, let the service use its defaults: if max_worker_count: body['predictionInput']['maxWorkerCount'] = max_worker_count if runtime_version: body['predictionInput']['runtimeVersion'] = runtime_version return body

9. Similarly, the Python code to call the Prediction APIs is as follows: import googleapiclient.discovery as discovery project_id = 'projects/{}'.format(project_name) ml = discovery.build('ml', 'v1') request = ml.projects().jobs().create(parent=project_id, body=batch_predict_body) try: response = request.execute() print('Job requested.')

[ 149 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

# The state returned will almost always be QUEUED. print('state : {}'.format(response['state'])) except errors.HttpError as err: # Something went wrong, print out some information. print('There was an error getting the prediction results.' + 'Check the details:') print(err._get_reason())

The preceding code sends a prediction request with a request.execute() method call, which executes the batch prediction job in an asynchronous manner. In the next section, we will take a look at real-time prediction using Cloud Machine Learning Engine, which shifts the paradigm to fully serverless ML on the Google Cloud Platform.

Real-time prediction using Cloud Machine Learning Engine Online prediction is optimized to minimize the latency of serving predictions. We can process one or more instances per request. Online predictions are returned in the response message as input data that's passed directly as a JSON string. It is returned as soon as possible. You should generally use online prediction when you are making requests in response to application input or in other situations where timely inference is needed. Batch predictions have their own drawbacks. You can see a considerable difference in how long it takes to complete the same prediction requests using online versus batch prediction if you use a single model and a small set of input instances. It could take a lot of time to finish projections that are returned by internet requests almost immediately. This is a side effect of the various facilities that are used by the two forecast techniques. At the time the application is created, the AI Platform assigns and initializes batch forecast resources. In general, the online forecast is prepared so that it can process when it's requested to. You can learn more about online prediction by going to https:/​/​github.​com/​PacktPublishing/ Hands-​On-​Artificial-​Intelligence-​on-​Google-​Cloud-​Platform.

The preceding link shows us that it is easy to interact with the ML engine with a simple and consistent API on the Google Cloud Platform. This API can be used for real-time predictions due to is high throughput and efficiency of using underlying elastic computation power, all of which is enabled by the Google Cloud Platform.

[ 150 ]

Building a Big Data Cloud Machine Learning Engine

Chapter 5

Summary In this chapter, we demonstrated the use of Keras in conjunction with Google Cloud Platform with a practical example use case. We understood how to use the Cloud Machine Learning Engine. Keras is an open source neural network library that is written in Python. It can run on Microsoft Cognitive Toolkit, TensorFlow, or Theano. We saw that Keras supports convolutional networks, recurrent networks, and a combination of both. Going forward, we performed an asynchronous batch prediction and real-time prediction using Cloud Machine Learning Engine. In the next chapter, we will learn about DialogFlow, which is used to build interfaces such as conversational IVR and chatbots that enable interactions between your business and users.

[ 151 ]

6 Smart Conversational Applications Using DialogFlow Artificial intelligence (AI) is continuously changing the way we search for and do stuff, and chatbots are the true illustration of people's desire to get rid of things that they do not want to do by themselves. AI-powered chatbots can do amazing things without involving humans. A chatbot is a smart chat program. It should be able to convincingly simulate a person. When it is combined with AI technologies, it is called an intelligent chatbot. The most common instance of a chatbot is the client support system that some companies use. It has developed so that 70–80% of the conversation is conducted without an actual person from the company engaging the customer in conversation. The banking and financial market is already making significant use of chatbots to deal with client requests and rapidly assist them in their bank transactions. For precisely the same reason, chatbot platforms are being provided by cloud providers to reduce time to market. In this chapter, we will learn how to build conversational applications using a Google Cloud Platform (GCP) service named DialogFlow. DialogFlow provides an easy way for building conversational applications for businesses and can greatly save operational costs. In this chapter, we will learn about the core concepts of DialogFlow and illustrate how to build a conversational application using an example. Here are the topics that we will cover in this chapter: Introduction to DialogFlow Building a DialogFlow agent Performing audio sentiment analysis using DialogFlow

Smart Conversational Applications Using DialogFlow

Chapter 6

Introduction to DialogFlow Before we start with DialogFlow, we need to understand at a high level how different technologies are used to empower an intelligent chatbot system. Most chatbots are a kind of interface for emails or dialog where bots respond to your text instead of people. These chatbots operate within the context of the containing application. However, the catch centers around the user interface layer that you communicate with. The conversations that humans have with bots is driven by machine learning (ML) algorithms that break your messages down into natural human language with natural language understanding (NLU) methods and answer queries in a manner that is comparable to what any human, on the other side, can expect.

Understanding the building blocks of DialogFlow Let's learn about the various building blocks of a conversational application by looking at the high-level architecture of DialogFlow. The following diagram represents the different components of any chatbot application at a high level:

Fig. 6.1: Components of a chatbot application

The independent building blocks of the system require fine-tuning and improvisation over a period of time and need constant feedback and training loops. In certain cases, the conversational applications are built based on a set of predefined responses to a set of known inputs. Obviously, these systems are not capable of providing a human-like experience and lack the natural conversational style. DialogFlow abstracts a lot of these processes and allows the application developers to focus on the context and semantics of the dialog with a simple-to-use interface.

[ 153 ]

Smart Conversational Applications Using DialogFlow

Chapter 6

The engine also leverages ML models that are constantly enhanced to support additional intents, entities, and contexts. In this section, we will learn about the various building blocks and interfaces of DialogFlow, as shown in the following list: DialogFlow agent: This component, in a nutshell, is similar to a human agent who needs to be trained to handle calls from the users. For example, a call center employee for a bank needs to understand some of the fundamental workflows, terminologies, and common scenarios that they are going to encounter while attending the customer calls; however, the training cannot cater to all the possible questions and conversational branches. The agent needs to understand the intent within the context and respond with the best possible option or a leading question if the information is insufficient to satisfy the customer's queries. Similar to the human agent, a DialogFlow agent is a module that replicates the human agent and understands the natural language with a certain level of fuzziness. As a DialogFlow application developer, we need to design and develop the DialogFlow agent for handling the conversations within the context of the intended application. DialogFlow intent: Once the agent converts the text into segments, certain keywords are marked and tagged to understand the intent within the agent's context. As in the case of human conversation, the DialogFlow agent and the human user on the other end take turns in the conversation in order for it to be meaningful dialog. The information gathered from the person in one turn is called the end user expression. The DialogFlow agent needs to be trained to match the end user expression to a preconfigured intent—this process is called intent classification. As an example, if we are designing a DialogFlow agent for handling restaurant reservations, the agent needs to respond to the user's questions and queries related to the menu, timings, and reservations. This is called the context of the conversation, and the agent needs to classify the user's intent within the context of restaurant reservations. Based on the intent classification, the agent either responds by seeking additional information from the user or queries the application's backend to find the answer to the question. The intent contains the following components: Training phrases: These are the predefined set of keywords that the agent looks for in the conversation within the context of the application. DialogFlow uses ontological mapping based on the primary set of phrases that are used by the application developer for expanding the vocabulary of the agent. This means that the application developer does not need to train the agent for all the possible intent keywords and phrases. The DialogFlow engine internally enhances the possible set of intent expressions within the agent's context.

[ 154 ]

Smart Conversational Applications Using DialogFlow

Chapter 6

Action: The application developer can define actions for the intents. The actions need to be predefined and configured in the system. The actions can possibly be a specific activity that makes modifications to the underlying dataset or a leading question that the agent poses within the next conversational output. For example, in the case of the hotel reservation system, if the intent of the end user is understood as a reservation for a specific time for a specific number of people, then the agent can go ahead and trigger the action to book the table. If the agent requires additional information to figure out the time of the reservation, a supplementary question can be posed as an action. Parameters: The intent is validated within the context of the application and DialogFlow extracts the end user expression as parameters. Each parameter is of a predefined type of entity. The system entities provided by DialogFlow match with the conversational data types. The system entities match dates, parameter values, ranges, email IDs, and so on. The parameters at this point define how the data from the end user is extracted. Parameters are structured data constructs that can be used for building logical expressions. Responses: The application developer can define the responses based on the context, intent, and derived action to the end user. Depending upon the context, the agent can end the conversation, take the intended action, or pose a question for gathering additional information. DialogFlow entity: When the agent extracts intent from the end user conversation, it is mapped to an entity. The entity associates a semantic meaning to the keyword. DialogFlow provides a set of system entities that are common conversational entities across various contexts—for example, amounts and units, date and time, and so on. DialogFlow also provides an interface for defining developer entities. These are the context-specific, custom entities that the application developer can create for the agent to understand the conversation within the context of the application. For example, the restaurant reservation agent can be trained with developer entities that map to the specific menu item served by the restaurant.

[ 155 ]

Smart Conversational Applications Using DialogFlow

Chapter 6

DialogFlow context: Similar to human interaction, the DialogFlow conversation happens within a context. The application caters to a specific business scenario and so the keywords need to be understood within the context of the application. It is important to correctly match the intent with the context of the application for the conversation to be meaningful. With the use of context, the conversation can be structured in a particular direction. There are two types of context that need to be addressed: Input context: This allows DialogFlow to match the intent when the end user expression is a close match within the context. Output context: This enables DialogFlow to activate a new context if the user expression is not a close match within the current context. As an example, if the end user says What is on the menu?, the output context is triggered and asks specific questions to clarify further, such as The vegetarian or non-vegetarian menu? Depending on the user choice, a specific context is activated and the options are spelled out by DialogFlow. At any given point in the conversation, multiple output contexts can be activated. This allows finer control of the output and so leads the conversation in the intended direction. A context at this point can be determined based on the user response. In this case, if the user chooses Vegetarian, the items within the vegetarian menu can be provided to the user. The output contexts have a lifespan and expire after 5 requests or 20 minutes once the intent is matched. Follow-up intents: We can use follow-up intents to set the context for various intents. There is a parent–child relationship between a parent intent and the follow-up intent. It is possible to create a nested hierarchy of follow-up intents within the context of the conversation. DialogFlow provides a set of predefined follow-up intents that represent a majority of the expressions used during a conversation. The platform also provides a way to define custom follow-up intents for more granular control and conversation flows. Here is a header list of all the follow-up intents provided by DialogFlow: Fallback: This is the expression when it is unclear about the intent and the context based on the user's input. Yes/No: Captures a positive/negative response to a follow-up question. Later: This links to a timed event that the user intends to occur sometime in the near future.

[ 156 ]

Smart Conversational Applications Using DialogFlow

Chapter 6

Cancel: This is generally applicable to the cancellation of a particular action or event. Typically, the conversation moves toward the closure of intent at this point or follows an alternate path. More: This is used when more information is required by the user or when the DialogFlow agent requires additional information from the caller in order to fulfill the intent. Next/Previous: This is used when dealing with a set of possible options. For example, if the conversation is about the menu items, the caller and DialogFlow agent can use this follow-up intent to navigate to the next or previous possible option. Repeat: This is used for repeating the conversation. Select number: This is the follow-up intent used when selecting numbered options. DialogFlow events: Using DialogFlow events, the agent can trigger conversation flows in response to external events. The external events are referred to as the nonconversational inputs within the context. For example, the receipt of an email, social media message, or a phone call to a particular number can be configured as an external event. DialogFlow agents can be configured to listen to such events and take a conversational path based on the specific event. There are some predefined platform events available on DialogFlow. These are the events that occur on the integrated platform. Here is the representative list of the platform events supported by DialogFlow: TELEPHONY_WELCOME: This event is generated when a call is received with the phone number registered with DialogFlow. ALEXA_WELCOME: This event is generated when a user starts interacting with a specific skill. MEDIA_STATUS: This event is generated based on the status of a specific media file (for example, when the playback of an audio file is completed). The DialogFlow action can be triggered on such media status events. SIGN_IN: This event is generated when a user logs in to an integrated service (Twitter, Google, and so on). In this event, a conversation flow can be triggered.

[ 157 ]

Smart Conversational Applications Using DialogFlow

Chapter 6

DialogFlow fulfillment: There are times when the conversation requires data from external sources in order to serve the user's desired information. In such cases, DialogFlow provides a fulfillment interface to fork the request to external sources, such as databases and APIs, and request specific data points. Once the data is received from the external services, DialogFlow integrates the data into the intent and context of the conversation and provides the response to the caller. A fulfillment setting can be enabled for each intent. If there is no fulfillment defined, DialogFlow uses a static response defined within the intent. Interaction with the fulfillment agent is enabled via a webhook service. A webhook makes it easy to integrate two heterogeneous applications. DialogFlow serializes the context and intent data over to the webhook service. The webhook service, in turn, calls the external API endpoint or accesses the database for the requested information. Here is the workflow that defines the life cycle of a DialogFlow request that requires the use of external source data via a fulfillment service:

Figure 6.2: Life cycle of a DialogFlow request via a fulfillment service

Following this introduction to the DialogFlow core concepts, in the next section, we will look at the process of building a DialogFlow agent on the platform.

[ 158 ]

Smart Conversational Applications Using DialogFlow

Chapter 6

Building a DialogFlow agent As a generic GCP principle, any service exists within a GCP project. One GCP project can contain one DialogFlow agent. If we want to have multiple DialogFlow agents, they need to be managed under a different project. A project organizes all the resources required by the DialogFlow agent. As a prerequisite, we need to enable APIs, monitoring tools, and billing information. We need to provide access to user accounts for the project and set access controls at a granular level so that the user has access to a minimum service footprint. We will work from the DialogFlow console by navigating to https:/​/​dialogflow.​cloud. google.​com/​#/​getStarted. Let's build a DialogFlow agent for a bookstore. Once you navigate to the console, click on the Create Agent button on the side menu or the console landing page. Select the agent name based on the context of the application along with the default language and the timezone. The DialogFlow console provides an option to use an existing GCP project as well as the creation of a new project during the agent creation workflow from the console. The following screenshot shows the agent creation screen in the DialogFlow console:

Figure 6.3: Agent creation screen in the DialogFlow console

We will create a new Google project from the DialogFlow console. Click on the Create button in the top-right corner. Once the agent is created, we are taken to the intents screen. DialogFlow provides two default intents for every agent. These are preconfigured intents that are typically required by any application: Welcome intent: This is a default intent for beginning the conversation. As a best practice, the agent needs to greet the user and match the overall user style of the greeting. It is also recommended that the welcome intent should reply with the domain-specific capabilities the agent provides. For example, in the case of the bookstore agent, the agent needs to greet the user and talk briefly about the bookstore.

[ 159 ]

Smart Conversational Applications Using DialogFlow

Chapter 6

Fallback intent: This is a default intent that is invoked when the agent cannot match the user expression with any of the configured intents. All the intents are configured with contexts, events, training phrases, actions and parameters, responses, and fulfillments. Let's look at the default welcome intent. The welcome intent can be configured and modified for engaging the user based on the application context:

Figure 6.4: Default welcome intent

The default welcome intent is configured with a set of training phrases and responses. Let's modify the default welcome intent to suit our bookstore agent. We can add a new response(s) as well as delete the default responses provided by DialogFlow.

[ 160 ]

Smart Conversational Applications Using DialogFlow

Chapter 6

The following screenshot illustrates the configuration of the welcome intent:

Figure 6.5: Configuring the default welcome intent

The DialogFlow console provides an easy way to quickly test the configured responses. On the right pane, DialogFlow provides a way to provide audio as well as text input and simulates the response from the DialogFlow agent based on the configured intent. Here is a screenshot that illustrates the response from the welcome intent:

Figure 6.6: Intent testing

[ 161 ]

Smart Conversational Applications Using DialogFlow

Chapter 6

We can configure the agent with the following tools: Text field for entering test phrases: The console enables the user to type the test string, and also enables integration with the system's microphone for testing with a spoken conversation. User expression: DialogFlow testing pane reproduces the entered or spoken text for validation and testing. Response: Based on the intent configuration, the response from the DialogFlow agent is displayed in this area. Diagnostic Info: This is a handy tool provided by DialogFlow for troubleshooting the intent request/responses. Click on the Diagnostic Info button to see the response from the DialogFlow agent in JSON format. Here is a sample response from the agent in JSON format: { "responseId": "af1b533d-b107-42c1-b2af-1fcc30ed6b01-b4ef8d5f", "queryResult": { "queryText": "Hello", "action": "input.welcome", "parameters": {}, "allRequiredParamsPresent": true, "fulfillmentText": "Hi, Thank you for calling My Book Store. We are open from 9 am to 6 pm", "fulfillmentMessages": [ { "text": { "text": [ "Hi, Thank you for calling My Book Store. We are open from 9 am to 6 pm" ] } } ], "intent": { "name": "projects/mybookstoreefjbhb/agent/intents/6fa880d8-1ed2-4999-86dc-a58211a164c4", "displayName": "Default Welcome Intent" }, "intentDetectionConfidence": 1, "languageCode": "en" } }

[ 162 ]

Smart Conversational Applications Using DialogFlow

Chapter 6

We can add response variations for the welcome intent. DialogFlow randomly selects the response based on the context and user expressions. In order to have granular control over the specific response to provide, we need to leverage fulfillment API by writing custom code. When we provide an open-ended response such as How can I help you?, the agent expects a response from the user for driving the conversation in a specific direction. We can handle these forks in the conversation by creating a custom intent. Before we create the custom intent, let's look at the default fallback intent provided by DialogFlow. Fallback intents are activated when the user's expression cannot be matched with any of the configured intents. DialogFlow provides a default fallback intent with a preconfigured set of responses when the intent matching based on the user's expression fails. Here is a snapshot of a default fallback intent in DialogFlow:

Figure 6.7: Default fallback intent

Let's create a couple of custom intents to help the MyBookStore DialogFlow agent to continue a conversation with the caller beyond the customized welcome intent. We want the agent to perform the following two tasks: Mention a list of new arrivals in the current month. Reserve a copy of the book for the user to pick up.

[ 163 ]

Smart Conversational Applications Using DialogFlow

Chapter 6

We need to create two intents in this case. Let's create our first intent, which informs users about the latest arrivals at the bookstore in the current month. For example, when the user says 'I want to know about the latest arrivals', the agent should respond with 'Here is a list of new books that have arrived this month. Book - 1, Author, Publication'. The following screenshot shows the process of the creation of a new custom intent:

Figure 6.8: Custom intent creation

[ 164 ]

Smart Conversational Applications Using DialogFlow

Chapter 6

As seen from the preceding screenshot, we perform the following steps:

1. Click on the Create Intent button in the DialogFlow console. 2. Provide a name for the custom intent. In this case, we are naming it New Arrivals. 3. Click on the Save button. 4. As a basic configuration, configure the various training phrases that the agent can respond to within the current intent. The following screenshot shows the process of configuring training phrases and the agent's response within the New Arrivals intent:

Figure 6.9: Configuration of training phrases and agent responses for the custom intent

Add multiple variations of possible user phrases within a specific custom intent. When we use natural language for communication, there are numerous ways in which we can express a particular thing. The more training phrases we configure, the more DialogFlow is able to train the model to accurately handle various ways of communicating. Internally, DialogFlow leverages ML algorithms that get better with more training data.

[ 165 ]

Smart Conversational Applications Using DialogFlow

Chapter 6

It is recommended that the intent needs to be configured with at least 10 combinations of the training phrases: 1. DialogFlow provides a training tool for the implementation of a feedback loop and improves the accuracy of the agent. Once the DialogFlow agent is deployed and the users start communicating with the agent, there are phrases and utterances from the user that get logged in DialogFlow. We need to enable the settings to log interactions to DialogFlow. The settings can be found on the General tab on the agent settings page. The following screenshot shows the general settings for the DialogFlow agent:

Figure 6.10: General configuration for logging interactions with DialogFlow

We need to enable the Log Settings for logging the interactions to DialogFlow. With this, we will be able to collect and store the user queries. The user queries will be available for manually training. This will help in creating a training loop that ensures that the agent performs better over a period of time as more and more users interact with the agent. 2. Add various responses for the New Arrivals intent. Within the context of this intent, all the variations of the input expressions expect a response from the agent to mention the list of new arrivals at the bookstore. Similar to what we did with the default welcome intent, we can configure multiple responses that provide information about new arrivals to the caller. DialogFlow randomly chooses a specific instance of the configured responses for each user expression that matches the configured training phrases within the intent. 3. The agent configuration can be tested with a set of training phrases.

[ 166 ]

Smart Conversational Applications Using DialogFlow

Chapter 6

4. DialogFlow shows the user expression that was interpreted and matched with the intent. 5. The agent's response based on the intent configuration. Once the user is aware of the new arrivals in the bookstore, the agent can lead the conversation to see if the user is interested. If they are, the intent response can be configured to lead the user to reserve a copy of a particular book for pickup. In order to facilitate this, we need to capture parameters from the user's expression. Here is the flow of the conversion with the MyBookStore agent so far: Caller: Hello My Book Store Agent: Hi, Thank you for calling My Book Store. We are open from 9 a.m. to 6 p.m. Caller: What are the new arrivals in this month? My Book Store Agent: We have three new arrivals this month. Would you like to come and pick up your next book? Caller: Yes. I want to come and pick up tomorrow. My Book Store Agent: Sure. What time would you like to come tomorrow? Caller: 10 a.m. My Book Store Agent: Got it. Your appointment is scheduled for Dec 30th 2019 at 10 a.m. See you soon. In this case, the user intends to visit the store and mentions the day. The DialogFlow agent needs to be configured to extract meaning from user expressions. It is an easy task for human beings to extract meaningful information from the conversation; however, machine (DialogFlow) agents need to be trained for specific conversation flows. To schedule appointments for a store visit, let's create a new intent named Store_Visit: 1. Create a new intent named Store_Visit. 2. In the training phrases section, add the following phrases: 3 pm today. Today. Yes. I want to come and pickup tomorrow. Once you enter these training phrases, note that two parameters appear in the Actions and parameters section. These parameters are mapped to the @sys.date and @sys.time system parameters. The following screenshot shows the training phrases and configured actions and parameters:

[ 167 ]

Smart Conversational Applications Using DialogFlow

Chapter 6

Figure 6.11: Customized intent with actions and parameters

With this setup, the My Book Store agent is able to book an appointment for the caller based on the specific training phrases when they contain the date and time information; however, in a real-world conversation, we cannot expect the user to provide all of the required information during the initial conversation. To get around this, we need to use a feature called slot filling. We need to set the identified parameters as REQUIRED. Refer to Figure 6.11. We need to make the date and time parameters required by checking the box in the first column.

[ 168 ]

Smart Conversational Applications Using DialogFlow

Chapter 6

Once that is done, the agent will require specific data and time information during the conversation and prompt the user for the date and time when the appointment needs to be scheduled. The sequence of the required parameters indicates the order in which the DialogFlow agent will seek information about the required fields. In this case, the agent will prompt the user to set the appointment date before proceeding with booking at a specific time. Once the appointment date is obtained, the agent prompts the user to set the appointment time. In the PROMPTS column, we can configure various expressions that prompt for a specifically required parameter. The following screenshot shows the prompts configuration for the $time parameter:

Figure 6.11: Slot filling and prompts configuration

These are the two distinct steps involved in making the conversation more meaningful and natural: Slot filling with the use of prompts for gathering the values of the required parameters Various prompts configured for gathering the time of the appointment

[ 169 ]

Smart Conversational Applications Using DialogFlow

Chapter 6

The conversation sounds increasingly natural as we add more training phrases and responses and fill the slots and prompts with natural conversational styles within the context of our application. Let's test the MyBookStore agent with the test interface provided by DialogFlow:

Figure 6.12: Testing the MyBookStore agent

The MyBookStore agent performs the following steps:

The agent seamlessly navigates the conversation through various intents and slot filling. At this point, we have provided merely the response saying that the appointment is booked. The appointment is not yet actually booked within the backend system, and no calendar entries have been made yet. In order to enable users to make calendar entries, we need to use the fulfillment process to create a calendar entry.

Use cases supported by DialogFlow The DialogFlow engine can be effectively used for building conversational applications for any industry or business where there is a requirement for a human agent to answer client queries or perform a set of preconfigured actions. With voice interaction and a strong NLP engine, the conversations sound natural, and if they are configured extensively, it is difficult for the caller to differentiate between a human agent and a DialogFlow agent.

[ 170 ]

Smart Conversational Applications Using DialogFlow

Chapter 6

DialogFlow agents offer serverless access to ML models that are internally trained during the development and configuration process. All the nonfunctional aspects and features supported by GCP are inherently available to the DialogFlow agent. Some of the significant advantages are scalability and availability. If the service encounters a large volume of traffic and API requests, the cluster is autoscaled to fulfill the increase in demand for computational resources. The platform also ensures 99.9% availability because of the use of the underlying GCP infrastructure. This is helpful for seamlessly providing 24/7 customer support for services. For example, in the case of an airline agent, the customers can inquire about flight schedules, book flights, or perform web check-ins with the conversational interface provided by DialogFlow. The agent can be available 24/7 for these services, which improves service levels and reduces the cost of operation significantly. The agent can be integrated with the client's CRM system in order to handle certain requests that the agent is not trained to handle. As the service is utilized, the logs are analyzed, and a feedback loop is established, the quality of the conversational agents improves over a period of time.

Performing audio sentiment analysis using DialogFlow DialogFlow provides a feature for performing sentiment analysis on each of the user expressions. This feature is useful in the context of a call center when the users of a product or service call for assistance. DialogFlow leverages the sentiment analysis engine from Cloud Natural Language. The sentiment analysis can be enabled from the DialogFlow settings menu by navigating to the Advanced settings and clicking Enable sentiment analysis for the current query. This feature is available in the Enterprise Edition of DialogFlow. DialogFlow also provides integration with the Cloud Natural Language engine for performing sentiment analysis. Each user conversation is a stateful interaction and is uniquely identified by session_id within DialogFlow. It is recommended that you use the same session ID within the API calls for continuous conversation. Here is a code snippet for using the DialogFlow API for performing sentiment analysis based on the user expressions in the conversation: def get_sentiment(PROJECT_ID, SESSION_ID, text,language_code): import dialogflow_v2 as dialogflow session_client = dialogflow.SessionsClient() session_path = session_client.session_path(project_id, session_id)

[ 171 ]

Smart Conversational Applications Using DialogFlow

Chapter 6

text_input = dialogflow.types.TextInput(text=text, language_code=language_code) query_input = dialogflow.types.QueryInput(text=text_input) sentiment_config = dialogflow.types.SentimentAnalysisRequestConfig(analyze_query_text_sentimen t=True) query_params = dialogflow.types.QueryParameters(sentiment_analysis_request_config=sentimen t_config) response = session_client.detect_intent(session=session_path, query_input=query_input, query_params=query_params)

The sentiment score for the user expression is encapsulated in the response object. This API is handy when there is a need for a smart fulfillment agent that can integrate with external systems and services for providing value-added service through the intelligent conversational agent. The service can integrate with external data sources for the conversations to be more meaningful and useful for the caller. This integration makes it possible to build intelligent agents with access to large volumes of external information and services. The platform also provides a closed feedback loop for improving conversations over a period of time as the agent is used for natural conversations. The platform also provides seamless integration with the Natural Language engine for performing sentiment analysis on each of the user expressions that are encountered by the DialogFlow agent. There are enormous possibilities and use cases that can be easily catered for by leveraging DialogFlow. With DialogFlow, machine intelligence and human-like conversation are available to functional teams.

Summary In this chapter, we have learned about DialogFlow, which is a service for building conversational applications. A conversational application requires a strong NLP engine, the infrastructure to train the models for analyzing user expressions, and a rules engine, as well as an intelligent agent for providing responses in natural language. Independently building these functional blocks requires coding the components. There is also a challenge of scalability and availability.

[ 172 ]

Smart Conversational Applications Using DialogFlow

Chapter 6

We have seen how the DialogFlow engine on GCP takes care of all the building blocks and allows developers to focus on the business scenario and provide an easy user interface, as well as an API layer to leverage the services. There is no need for coding in order to build simple conversational agents with static content, and so the development team need not have high-end programming skills. The simple conversational agents can be built by business analysts and someone who understands the functional aspect of the service. The integration with external services (email, phone, social media, and so on) can also be done by the fulfillment agent. In the next chapter, we will take a deep dive into cloud TPUs, which are fundamental building blocks for building high-performing ML operations.

[ 173 ]

3 Section 3: TensorFlow on Google Cloud Platform Tensor Processing Units (TPUs) are fundamental building blocks of high-performing AI applications on Google Cloud Platform (GCP). In this section, we will focus on TensorFlow on GCP. This section contains three chapters. We will cover Cloud TPUs in depth and how they can be leveraged to build significant AI applications. We will also implement TensorFlow models using Cloud ML Engine by building prediction applications by leveraging Cloud TPUs. This section comprises the following chapters: Chapter 7, Understanding Cloud TPUs Chapter 8, Implementing TensorFlow Models Using Cloud ML Engine Chapter 9, Building Prediction Applications

7 Understanding Cloud TPUs Any platform is only as good as how it is leveraged. The most important aspect of leveraging a platform such as Google Cloud Platform (GCP) is to operationalize it for dayto-day workload processing for business. In this chapter, we will see some of the best practices and practical tips on operationalizing artificial intelligence (AI) on GCP. We will understand Tensor Processing Units (TPUs) and how TPUs function internally to facilitate massively parallel computation requirements, in order to build various services that leverage Machine Learning (ML) models. This chapter will cover the following topics: Introducing Cloud TPUs and their organization Mapping of software and hardware architecture Best practices of model development using TPUs Training your model using TPUEstimator Setting up TensorBoard for analyzing TPU performance Performance guide Understanding preemptible TPUs

Introducing Cloud TPUs and their organization TPUs are the fundamental building blocks for building various services and ML models on GCP. In order to make the best use of the platform, we need to understand the core concepts of TPUs. These core concepts will help us in optimizing performance and will allow us to take maximum advantage of the computation resources allocated for the account.

Understanding Cloud TPUs

Chapter 7

TPUs have been developed by Google to accelerate ML workflows. Cloud TPUs enable users to run their ML workflows on Google's Cloud TPU hardware using TensorFlow. Users can get maximum performance gains using TPUs, specifically for linear algebraic algorithms. TensorFlow compute clusters can build leveraging Central Processing Units (CPUs), Graphics Processing Units (GPUs), and TPUs. Later in the chapter, we will discuss in detail the areas in which TPUs can be utilized and can provide great benefits. Let's quickly check the regions in which Cloud TPUs are available as of now. This will help you to decide the nearest region for deploying your model: Availability in US region: At the time of writing this chapter, TPUs are generally available in the US and Europe regions. General availability in the US region is listed in the table shown in the following screenshot:

TFRC is short for TensorFlow Research Cloud.

Availability in Europe region: General availability in the Europe region is listed in the table shown in the following screenshot:

[ 176 ]

Understanding Cloud TPUs

Chapter 7

Availability in Asia region: Additionally, GCP is also expanding in the Asia region, with limited availability at the time of writing this chapter, as shown in the following screenshot. It is expected that the Asia region will also see increased availability as GCP adoption increases across the geographic region:

Applications can access TPU nodes from instances, containers, and AI services via a Virtual Private Cloud (VPC) network. The means of accessing TPU nodes on GCP are outlined in the following list: Cloud TPU on Compute Engine is good for users who need it for managing their own Cloud TPU services; and in general, it is recommended for users who are new to Google Cloud as it has a CTPU utility that does the basic background work for users, such as setting up virtual machines, cloud storage services, and TPUs. Cloud TPU on Kubernetes Engine can be used when you need auto-scaling in your application, flexibility to change your hardware (between CPUs, GPUs, and TPUs), auto-management of virtual machines and Classless Inter-Domain Routing (CIDR) block ranges, and—more importantly—a fault-tolerant application.

[ 177 ]

Understanding Cloud TPUs

Chapter 7

Cloud TPU on AI services should be used by experienced ML programmers who can take advantage of the managed AI services provided by GCP. AI services manage a lot of activities in AI workflow for the user, such as training a ML model on user's data, deploying a model, prediction monitoring, and model management. Now that we have a basic understanding of TPUs, let's see areas of usage and the advantages of TPUs. The following screenshot describes the areas of hardware usage, that is, CPUs, GPUs, and TPUs:

It is evident from the preceding screenshot that we need to think in the context of the use case and the problem at hand before making a choice between CPUs, GPUs, and TPUs. Very large models with a massive amount of training and evaluation data dominated by matrix computation are most suitable to be trained on TPUs.

Advantages of using TPUs In ML algorithms, generally, linear algebraic computations are used. TPUs maximize performance for such computations; models that take a significant amount of time on GPUs can be trained within a very short time on TPUs. Moreover, for Convolutional Neural Network (CNN) models, time to accuracy is significantly reduced by Cloud TPU. If you have an application that is suitable for Cloud TPU, you can get the maximum output in minimum time, at a very low cost. A quick comparison of GPUs and TPUs for training a ResNet 50 model is shown in the following screenshot. ResNet 50 is a CNN model that has been trained on more than a million images from the ImageNet database. It has been observed that if a user is using eight v100 GPUs and one full Cloud TPU v2 pod, both training duration and cost reduce significantly. While training is completed 27 times faster, the cost is reduced by 38% as well.

[ 178 ]

Understanding Cloud TPUs

Chapter 7

The following screenshot compares the cost of training on a GPU and a TPU:

Fig. 7.1 Training cost comparison

It is evident from the preceding screenshot that TPUs result in significant cost savings for complex model training jobs. Over a period of time and multiple deployments, the cost savings become more significant, especially in the production deployment scenario, where the model needs to be trained frequently.

Mapping of software and hardware architecture In order to further accelerate the mathematical computations, dedicated hardware is created as an Application-Specific Integrated Circuit (ASIC). With this, the computation power, especially in n-multidimensional mathematics, sees a multifold increase. In this section, we will cover the details of the software and hardware architecture of a TPU and its related components.

[ 179 ]

Understanding Cloud TPUs

Chapter 7

Available TPU versions Each TPU version consists of a few major components that form the capacity of the TPU. While discussing TPU versions, we will look at the architecture of a TPU core and the high bandwidth memory (HBM) of a TPU core. We will understand how cores on each TPU device are interconnected and how the networking interface is available for inter-device communication. The following list shows the characteristics of TPU v2: TPU v2 has an HBM of 8 GB for each TPU core. Each core has one Matrix Unit (MXU) of 128 * 128. One TPU pod can have up to 512 cores and 4 TB of total memory. The TPU v2 organization is shown in the following diagram:

Fig. 7.2 TPU V2 Characteristics

The following list shows the characteristics of TPU v3: TPU v3 has an HBM of 16 GB for each TPU core. Each core has two MXU of 128 * 128. One TPU pod can have up to 2,048 cores and 32 TB of total memory. The TPU v3 organization is shown in the following diagram:

[ 180 ]

Understanding Cloud TPUs

Chapter 7

Fig. 7.3 TPU Organization

Each TPU core consists of scalar units, vector units, and MXUs. TPU chips are backed by MXU compute power. In one cycle, an MXU can perform 16,000 multiply-accumulate (MAC) operations. MXU inputs and outputs are 32-bit floating-point values, but it performs multiplications at bfloat16 precision for better accuracy. Each core can perform user operations independently and, through high-bandwidth interconnects, it can communicate with other chips. For large ML workloads, multiple TPU devices can be interconnected with high-speed network interfaces to get a large pool of TPU cores and memory.

Performance benefits of TPU v3 over TPU v2 As we have seen earlier in the book, TPU v3 has more TeraFlops (TFLOPS) and memory than TPU v2. So, there are definitely a few areas where TPU v3 can get better results than TPU v2. To determine which TPU version to use, you can run your model on the available versions, and check the performance by using TensorBoard. Here are a few areas where TPU v3 can perform better: Models that are compute-bound gain significant benefits on TPU v3. Cases where data does not fit in TPU v2 memory but fits in TPU v3 memory gain benefits. New models with a batch size that does not fit in TPU v2 can again gain performance benefits. There are a few areas where a model is input-bound or a model is memorybound; there, you may not see such performance gains. So, before deciding the TPU version, do performance bench marking and cost-value analysis in the context of the intended use case.

[ 181 ]

Understanding Cloud TPUs

Chapter 7

Available TPU configurations Let's discuss the different types of TPU configurations provided by Google. Google provides two different types of configurations that users can take advantage of in both TPU versions. They are as follows: Single-device TPUs: These are individual devices that perform all operations. These devices are not interconnected with other TPU devices over the network. In fact, more than one single TPU device cannot be connected to another device over the network. TPU pods: TPU pods are nothing but clusters where more than one TPU device is interconnected with another device over a high-speed network. The following diagram depicts a single TPU device that does not require high-speed network bandwidth as it is not connecting to another TPU device. The TPU node connects to just one TPU device. Chips in a TPU are already interconnected and do not require a host CPU or host networking resources:

Fig. 7.4 Single-device TPU

[ 182 ]

Understanding Cloud TPUs

Chapter 7

The device in the preceding diagram is using Cloud TPU v3, but while setting up the node, you can specify any TPU version. We will discuss the flow in the Software architecture section. The following screenshot depicts a TPU pod, where multiple TPU devices are connected over a high-speed network connection:

Fig. 7.5 TPU pod

In the preceding diagram, take note of the following: The host is distributing the ML workflow over different TPU devices. Chips in the TPU are already interconnected and do not require a host CPU or host networking resources. As TPU devices are connected over a high-speed bandwidth network, they do not require a host CPU or host networking resources either. The device in the preceding diagram is using Cloud TPU v2, but while setting up the node, you can specify any TPU version. While creating a TPU node, we can also specify whether to occupy either a full or partial TPU pod. TPU management for the TPU nodes can be automated using the Cloud TPU API, and this helps enormously in scaling your cluster and managing workloads.

[ 183 ]

Understanding Cloud TPUs

Chapter 7

Software architecture In this section, we will see what happens on the software side when you run an application. The flow of the TPU software architecture: A computational graph is generated by TensorFlow and sent to the TPU node over gRPC Remote Procedure Calls (gRPC). Based on the type of TPU you selected and the number of devices available for your workload, the TPU node compiles the computation graph just in time and sends the binary to one, or more, available TPU devices. The following diagram depicts the software component blocks of the TPU software architecture. It consists of a ML model, TPUEstimator, TensorFlow Client, TensorFlow Server, and XLA Just In Time (JIT) Compiler:

Fig. 7.6 Software Component Blocks of a TPU

[ 184 ]

Understanding Cloud TPUs

Chapter 7

Let's discuss each component in detail, as follows: TPUEstimator: TPUEstimator simplifies model building for Cloud TPU to extract maximum TPU performance. TPUEstimator is a high-level API built on an estimator. TPUEstimator translates the ML program into TensorFlow operations. TPUEstimator should definitely be used for ML models that use Cloud TPU. TensorFlow Client: TensorFlow operations are converted into a computational graph by the TensorFlow client and are then sent to the TensorFlow server via gRPC. TensorFlow Server: The TensorFlow server runs on the Cloud TPU server. When the TensorFlow server receives a computational graph from the TensorFlow client, it loads the input from the required storage. It partitions the graph into chunks that should be run on a TPU and a CPU. It generates Accelerated Linear Algebra (XLA) operations for running sub-graphs on Cloud TPU and invokes an XLA compiler. XLA Compiler: XLA is a just-in-time (JIT) compiler. The TensorFlow server produces operations that are taken as input by the XLA compiler. XLA generates the binary code that runs on Cloud TPU, including orchestration of data from onchip memory to hardware execution units and inter-chip communication. Cloud TPU uses Peripheral Component Interconnect Express (PCIe) connectivity between the Cloud TPU server and Cloud TPU to load the binary code, and then, it is launched for execution.

Best practices of model development using TPUs In this section, we will discuss how to develop models on Cloud TPUs to maximize the model performance and utilize hardware optimally. Let's quickly look at the TPU chip configuration. A single TPU chip contains two cores, and each core has multiple MXUs. As we have discussed, MXUs are well suited for programs performing dense matrix multiplication and convolutions. So, for such programs, we should definitely consider a TPU and utilize its hardware. Programs that are not doing matrix multiplication and that perform operations such as add will not use MXUs efficiently. In the following subsection, we will go through some guidelines that will help you to decide if you should use a TPU and inform you how you can develop your model to get the best possible performance out of it.

[ 185 ]

Understanding Cloud TPUs

Chapter 7

Guiding principles for model development on a TPU To fully utilize a hardware program, you should make use of all the available cores on it as this will enhance the timing of the model's training (each TPU device contains four chips and eight cores). This can be achieved using TPUEstimator. It provides a graph operator, which helps in building and running replicas. Each replica is a training graph that runs on each core, and it is essentially one-eighth of the batch size. Layout and shapes are very important aspects of achieving performance gains. When the XLA compiler transforms the code, it includes tiling a matrix into multiple small blocks. This increases the efficient use of the MXU. As the MXU is of 128 * 128, it prefers the tiling to be in multiples of 8. There are some matrices that are suitable for tiling, while some require reshaping; these are memory-bound operations. Models that have a constant shape are suitable for a TPU, while models whose shapes change are not suitable for Cloud TPU, as recompiling their shape slows the process down. For a high-performing Cloud TPU program, a dense compute should be easily divided into multiples of 128 * 128. Padding is an option used by the XLA compiler, which pads tensors with zeros when the MXU is not completely utilized. You can see the padding applied by the XLA compiler by means of op_profile. While generally, padding helps in performance gains, there are a couple of drawbacks as well. Applying padding means underutilization of the MXU, and as it increases the on-chip storage required for a tensor, it can sometimes lead to an out-of-memory error. So, choosing the right dimension becomes very important to minimize/avoid padding. The tensor dimension should be chosen very carefully, as it plays a significant role in extracting maximum performance out of the MXU. The XLA compiler either uses batch size or dimension to get the maximum performance out of the MXU; so, either one should be a multiple of 128. If neither is a multiple of 128, compiler should pad 1 to 128. Maximum performance can be gained out of the MXU, as its batch size and TPU are multiples of 8.

Training your model using TPUEstimator In this section, we will discuss the usage and advantages of TPUEstimator while training your model.

[ 186 ]

Understanding Cloud TPUs

Chapter 7

TPUEstimator absorbs a lot of low-level, hardware-specific details, and simplifies the model running on Cloud TPU. TPUEstimator performs numerous optimizations internally to enhance the performance of the model. A model written using TPUEstimator can run across different hardware such as a CPU, a GPU, TPU pods, and single TPU devices, mostly with no code change.

Standard TensorFlow Estimator API TensorFlow Estimator provides an API for training, evaluating, running, and exporting the models for serving, as shown in the following table. For an estimator, you should write the model_fn and input_fn function that correspond to the model and input portion of the TensorFlow graph. Let's have a look at the following screenshot:

Fig. 7.7 TensorFlow API

In addition to the aforementioned functionality, TensorFlow Estimator includes some other common functionalities such as job training, checkpoint functionality, and so on.

TPUEstimator programming model GCP provides a consistent model for programming TPUEstimator. Here are the details about the model: TPUEstimator uses model_fn, which wraps the computation and distributes it to all Cloud TPU cores. In order to achieve maximum performance, the learning rate should be tuned with the batch size. Replication and distribution of computation across the TPU is done by model_fn. You have to make sure that computation only contains Cloud TPUsupported operations.

[ 187 ]

Understanding Cloud TPUs

Chapter 7

The input pipeline running on the remote host CPU is modeled by the input_fn function. Input operations should be programmed using tf.data. Every invocation handles the input of the global batch onto one device. The shard batch size is retrieved from the ['batch_size'] parameter. Make sure to return a dataset instead of tensors for optimal performance. Application code always runs on the client, and the worker executes the TPU computation. To gain a good performance input pipeline, the operation is always placed on the remote worker, and only tf.data supports it. To amortize the TPU launch cost, the model training step is wrapped in tf.while_loop and, as of now, only tf.data can be wrapped by tf.while_loop. So, precisely for these two reasons, tf.data must be used.

TPUEstimator concepts Basically, TensorFlow programs can use in-graph replication and between-graph replication. To run TensorFlow programs, TPUEstimator uses in-graph replication. We will go through the difference between in-graph replication and between-graph replication later. The TensorFlow session master does not run locally in TPUEstimator. Our program creates a single graph that is replicated to all the available cores in Cloud TPU, and the TensorFlow session master is set to be the first worker. The input pipeline is placed on the remote hosts so that training examples can be fed quickly to Cloud TPU. A synchronous operation happens between the Cloud TPU workers; that is, each worker performs the same step at the same time.

Converting from TensorFlow Estimator to TPUEstimator When you are converting to any other tool, make sure you start with a small example, and then, work on a complex one. This helps to get familiar with the basic concepts of any tool. To convert a tf.estimator.Estimator class to use tf.contrib.tpu.TPUEstimator, you will need to follow these steps: 1. Change tf.estimator.RunConfig to tf.contrib.tpu.RunConfig.

[ 188 ]

Understanding Cloud TPUs

Chapter 7

2. Set TPUConfig to specify the iterations_per_loop. The specified number of iterations of the training loop are performed by Cloud TPU before returning to the host. Checkpoints or summaries are not saved until all Cloud TPU iterations are run. 3. In model_fn, use tf.contrib.tpu.CrossShardOptimizer to wrap your optimizer, as shown in the following code. You will have to change tf.estimator.Estimator to tf.contrib.tpu.TPUEstimator for converting to TPUEstimator: optimizer = tf.contrib.tpu.CrossShardOptimizer( tf.train.GradientDescentOptimizer(learning_rate=learning_rate))

The default RunConfig saves summaries for TensorBoard after every 100 steps and writes checkpoints every 10 minutes.

Setting up TensorBoard for analyzing TPU performance Analyzing the performance of any application is extremely critical, and TensorBoard helps in visualizing and analyzing performance on Cloud TPU. Using TensorBoard, you can not only monitor your application but also improve its performance by applying suggestions provided by TensorBoard. After setting up Cloud TPU, you should install the latest version of the Cloud TPU profiler to create a capture-tpu-profile script. The following are the steps to run TensorBoard: 1. Open a new Cloud Shell to start TensorBoard. 2. Run the following command to set the required environment variables and create environment variables for your cloud storage bucket and model directory. The model directory variable (MODEL_DIR) contains the name of the GCP directory where checkpoints, summaries, and the TensorBoard output are stored during model training. The following code shows the commands with the parameters that can be used for setting up TensorBoard: $ ctpu up ctpu up --name=[Your TPU Name] --zone=[Your TPU Zone] $ export STORAGE_BUCKET=gs://YOUR STORAGE BUCKET NAME $ export MODEL_DIR=${STORAGE_BUCKET}/MODEL DIRECTORY

[ 189 ]

Understanding Cloud TPUs

Chapter 7

TensorBoard trace can be viewed in two ways: The static trace viewer The streaming trace viewer If you need more than one million events per TPU, you will have to use the streaming trace viewer.

Let's check how to enable the static trace viewer. 3. Run the following command in the same Cloud Shell that was used to set the environmental variables, which were set in the preceding step: $tensorboard --logdir=${MODEL_DIR} &

4. In the same Cloud Shell, at the top, click on Web Preview and open port 8080 to view the TensorBoard output. To capture output from the command line, run the following command: $ capture_tpu_profile --tpu=[YOUR TPU NAME] --logdir=${MODEL_DIR}

TensorBoard provides the following features: TensorBoard provides various options to visualize and analyze performance. You can visualize graphs and utilize the Profiler to improve the performance of your application. The XLA structure graph and the TPU compatibility graph are very useful for analysis. There are some profiler options as well, such as the overview page, input pipeline analyzer, XLA Op profile, trace viewer (Chrome browser only), memory viewer, pod viewer, and streaming trace viewer (Chrome browser only). These are very useful for analyzing performance and tuning your applications.

[ 190 ]

Understanding Cloud TPUs

Chapter 7

Performance guide While developing a model, it is very important to take care of tuning it to get good performance out of it. In this section, we will go through some tips that will help you to enhance the performance of your model on Cloud TPU. Cloud TPU, as such, gives good performance, but we can enhance it by setting the right configuration for the application. In the following subsection, we will discuss the areas of focus for improving performance.

XLA compiler performance The XLA compiler is a part of the standard TensorFlow code base. It is a ML compiler and can produce binaries for CPUs, GPUs, TPUs, and a few other platforms. TensorFlow models for Cloud TPU are translated into an XLA graph, which the XLA compiler then translates to a TPU executable. Here, one thing we will notice is that TPU hardware is different from CPU and GPU hardware. A CPU has a low number of high-performing threads, while a GPU has a high number of low-performing threads. Cloud TPU has 128 * 128 MXUs that can perform as a single very high-performing thread and can perform 16,000 operations per cycle or tiny 128 * 128 threads that are connected in a pipeline fashion.

Consequences of tiling Arrays in Cloud TPU are tiled. This demands the padding of one of the dimensions to be to a multiple of 8, and a different dimension to a multiple of 128. Data layout transformations are performed by XLA to arrange data in the memory for efficient usage. These transformations are driven by heuristics. This performs well for most models, but it can go wrong sometimes. To achieve maximum performance, different configurations should be experimented with. A couple of very important things that will give a maximum performance boost are the following: The cost of padding should be minimized. Values for batch and feature dimensions should be picked very efficiently.

[ 191 ]

Understanding Cloud TPUs

Chapter 7

Fusion If there are multiple operations that are to be executed in combination, the XLA compiler uses a fusion technique to optimize programs. A fused operation is a combination of multiple operations that can be executed in combination. For example, consider the following series of operations: tf_add = tf.add(x, y) result = tf.multiply(tf_add, z)

In steps, this code will be executed like so: 1. First, the loop will sequentially access elements and perform an addition operation, and the result will be saved in tf_add, which will be stored in temporary memory, as shown in the following code block: for (i = 0; i < cnt; i++) { tf_add[i] = x[i] + y[i]; }

2. Now, the addition result will be accessed again and multiplied, as shown here: for (i = 0; i < cnt; i++) { result = tf_add[i] * z[i]; }

With fusion, the array accesses happen at the same time, as can be seen in the following code block: #In Fusion both addition and multiplication operations are performed together. for (i = 0; i < cnt; i++) { result = (x[i] + y[i]) * z[i]; }

In this example, the number of memory round trips is reduced, and XLA didn't need to allocate any space for tf_add. Fusion brings a lot of benefits for Cloud TPU, such as the following: memory transfer is reduced, the hardware is optimally utilized, and so on.

[ 192 ]

Understanding Cloud TPUs

Chapter 7

Broadcasting will implicitly happen when two tensors of the same shape are combined, but be aware that forcing a broadcast to be materialized can lead to poor performance.

Understanding preemptible TPUs A preemptible TPU is a low-cost TPU, but it will give the same performance as an ondemand TPU. But the catch here is that Google can always terminate it when it needs resources for other purposes. Let's check how we can create a preemptible TPU from the console.

Steps for creating a preemptible TPU from the console GCP provides an easy interface for creation of preemptible TPUs. Here are the steps involved: 1. On the GCP console, under Compute Engine, select TPUs. If the API is not enabled, enable it by clicking ENABLE, as shown in the following screenshot:

Fig. 7.8 Enabling Cloud TPU API

[ 193 ]

Understanding Cloud TPUs

Chapter 7

2. Click on Create TPU node, as shown in the following screenshot:

Fig. 7.9 Creation of TPU node

3. Fill in the required details, select the Preemptibility option, and click on the Networking, description, labels link, as shown in the following screenshot:

Fig. 7.10 Create a Cloud TPU

[ 194 ]

Understanding Cloud TPUs

Chapter 7

4. Fill in the other details, and click on Create at the bottom of the screen, as shown in the following screenshot, and your TPU is ready:

Fig. 7.11 Additional details for creation of Cloud TPU

Preemptible TPU pricing A preemptible TPU comes at roughly around 30% of the price in comparison to an ondemand TPU. A point to note here is that the quota for a preemptible TPU is higher than for a normal TPU, and preemptible TPUs have a separate quota. The following screenshot shows a pricing-comparison sample for v2-8 and v3-8 in the us-central1 region:

Fig. 7.12 Price comparison v2-8 and v3-5

[ 195 ]

Understanding Cloud TPUs

Chapter 7

Preemptible TPU detection If a TPU is already created and we have to check if it is preemptible or not, then there is a provision for that. The following commands are used to detect if a TPU is preemptible or not: The ctpu command: Run the following command to check the details of the TPU you have created. A value printed against TPU Preemptible indicates if the TPU is preempted or not, a READY value means the TPU is not preempted, whereas a PREEMPTED value means the TPU has been preempted: $ ctpu status

The gcloud command: This will use the value of the compute property in the current configuration if the zone is not specified already. Run the following command to retrieve a list of compute zones available for Cloud TPUs in your project: $gcloud compute tpus list

Before deciding on the TPU option, please check if your use case can move ahead with a preemptible TPU, as it will save a lot of costs.

Summary In this chapter, we have acquired all the required knowledge to create a TPU and to write models on it. We have gone through the basics of Cloud TPUs and their organization, and Cloud TPU software and hardware architecture. We have created TPUs and preemptible TPUs on Google Cloud. We have written models and trained them using TPUEstimator. We have profiled Cloud TPU using TensorBoard. Besides learning all this, we have gone through enough tips to write optimized models as well. In the next chapter, we will go through the best and proven practices for implementing TensorFlow models on GCP, based on experience of working on real projects.

[ 196 ]

8 Implementing TensorFlow Models Using Cloud ML Engine Cloud ML Engine on Google Cloud Platform (GCP) is a serverless way in which a machine learning pipeline can be built. The engine leverages underlying platform components and eliminates the need for configuration and maintenance of the infrastructure. Data scientists can focus on the data, model, and predictions. This is an ideal and quick way to get the models up and running in a production environment. The platform inherently provides storage and compute elasticity and a virtually unlimited scale for training the models and using the deployed models for real-time predictions. In this chapter, we will take a deep dive into Cloud ML Engine and understand the various building blocks, as well as experiment with the machine learning pipeline using a TensorFlow model. This chapter will cover the following main topics: Understanding the components of Cloud ML Engine Steps involved in training and utilizing a TensorFlow model Packaging and deploying your training application in Cloud ML Engine Choosing the right compute options for your training job Monitoring your TensorFlow training model jobs

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

Understanding the components of Cloud ML Engine To begin with, let's understand which machine learning workflow units are fulfilled by Cloud ML Engine. Cloud ML Engine can be used for the following purposes: Training a machine learning model Deploying the trained model Predicting using the deployed model Monitoring the model usage through various parameters and KPIs Model management along with versioning Cloud ML Engine has various components that perform a unique operation and play a role within the machine learning pipeline. The components are the service components that leverage underlying platform components and utilize the required amount of storage and compute, depending on model complexity and data volume. Here are the components of Cloud ML Engine.

Training service The training service model offers some pre-defined algorithms that are readily available for training without having to write any code. The algorithms can be used with the training data, provided that the data is confined to the intended schema within the available algorithm on the platform.

Using the built-in algorithms At the time of writing this chapter, the following algorithms are supported by Cloud ML Engine: Linear Learner: This learner algorithm uses the TensorFlow estimator, LinearClassifier, and LinearRegressor, and can be used for classification and regression problems. This algorithm supports Graphics Processing Unit (GPU) acceleration, along with the default Central Processing Unit (CPU) accelerator. Wide and Deep: This is an algorithm that intends to have an optimum level for memorizing the training data and, at the same time, generalizing the inputs. This algorithm is useful for classification, regression, and ranking problems. This algorithm supports GPU acceleration, along with the default CPU accelerator.

[ 198 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

XGBoost: We have already seen this algorithm in detail in Chapter 3, Machine Learning Applications with XGBoost, Cloud ML Engine provides a built-in wrapper over the algorithm and facilitates both the stages—that is, preprocessing and training—in a parallelized manner. During the preprocessing stage, Cloud ML Engine transforms the categorical and numerical data into a unified dataset that is entirely represented in a numerical format. This algorithm can be readily used for use cases such as click-through rate prediction. This algorithm is not supported on a GPU accelerator and can be used with a CPU only. Once a suitable algorithm is identified, the input (training, evaluation, and production) data needs to be formatted to match the intended schema for the built-in algorithm. The data needs to be submitted in a Comma-Separated Values (CSV) format without headers, and the first column should represent the target variable. We need to specify the storage bucket on GCP for the built-in algorithm to store the training output. Cloud ML Engine provides limited customization for training jobs, such as the use of specific machine types for training. A primitive set of machine types can only be used for built-in algorithms. We can also define the region on which a job needs to be run. By default, the platform automatically selects an appropriate region for running the training job. Job-specific customization can also be done by using the required learning rate and batch size. We can set the goal thresholds for the hyperparameter tuning in order to achieve maximum accuracy and minimize the value of the loss function. The built-in algorithms are available within the containers on the shared space on the GCP, which can be used with specific Uniform Resource Identifiers (URIs), as shown in the following table: Algorithm Linear Learner Wide and Deep XGBoost

Container URI

gcr.io/cloud-ml-algos/linear_learner_cpu:latest gcr.io/cloud-ml-algos/linear_learner_gpu:latest gcr.io/cloud-ml-algos/wide_deep_learner_cpu:latest gcr.io/cloud-ml-algos/wide_deep_learner_gpu:latest gcr.io/cloud-ml-algos/boosted_trees:latest

Let's use the built-in Linear Learner algorithm for a simple regression problem. For example, there is a known corelation between Scholastic Aptitude Test (SAT) scores and Grade Point Average (GPA) scores of students. We will have a sample CSV file that contains two columns.

[ 199 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

The first column contains the GPA scores (output variable), and the second column contains the SAT scores. Let's first get the CSV file uploaded to the storage bucket. The following are the simple steps to take to use Cloud ML Engine in order to train a model, which predicts the GPA scores based on the SAT scores: 1. Enter the GCP console and AI Platform from the navigation menu and go to the Jobs side menu. You will see all the jobs (running and completed) in a tabular format. Click on the NEW TRAINING JOB button present in the header menu. 2. There are two options to create a training job: with Built-in algorithm training or Custom code training. In this case, select the Built-in algorithm training option. 3. Select the Linear Learner algorithm. 4. Click on the NEXT button. All the preceding steps are visually represented in the following screenshot. Let's have a look:

Fig 8.1 - Steps involved in model training on Cloud ML Engine (1)

5. Provide the fully qualified path to the CSV file on Google Storage. This CSV file contains training data without a header row, and the target attribute is present in the first column.

[ 200 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

6. Provide a numeric value for the percentage of training data that is to be used for validation. By default, this value is 20, and this figure can be changed to any reasonable value based on the use case context. 7. Provide a numeric value (optional) for the percentage of training data that is to be used for testing. A recommended value is 20, and it can be set as per the requirements of the use case and the characteristics of the machine learning problem. 8. Provide a fully qualified path to the output directory where the model file is to be stored. This needs to be a valid location on Google Cloud Storage. 9. Click on the NEXT button to provide the runtime arguments to the algorithm. All the preceding steps are visually represented in the following screenshot. Let's have a look:

Fig 8.2 - Steps involved in model training on Cloud Engine ML (2)

[ 201 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

10. Select the type of model that needs to be trained. A regression model is used when the expected output is a continuous variable, while a classification model is used when the output is expected to be a discrete class. 11. Select the maximum number of steps carefully in order to ensure that the entire training data is fully represented in the generated output model. At the same time, setting a very high value for this parameter increases the computation cost of training the model. 12. The Learning Rate is a numeric value used when the model is utilized using the Gradient Descent algorithm. In principle, this attribute indicates the size of the step between two iterations of the learning algorithm. Here are the basic arguments that need to be set. All the preceding steps are visually represented in the following screenshot. Let's have a look:

Fig 8.3 - Passing arguments to the model training

Apart from these basic arguments, the model can be hyper-tuned for further optimization and increased accuracy. When the HyperTune checkbox adjacent to Max Steps is checked, the minimum and maximum steps need to be set; and when the HyperTune checkbox adjacent to Learning Rate is checked, the minimum and maximum learning rate need to be set.

[ 202 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

In addition to the basic arguments, there are some advanced settings that are available as configurable parameters for further tuning of the model training, as shown in the following screenshot:

Fig 8.4 - Advanced model training configurations

[ 203 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

The additional parameters available for tuning are as follows: Evaluation Steps: This is an optional field that indicates the number of batches for which the evaluation needs to be run. If the parameter is not specified, the evaluation is run for the entire dataset. Batch Size: This is the number of data rows that are processed in one evaluation step. Optimizer type: There are three possible optimizers we can choose from that are differentiated between according to the specifics of the implementation of the Gradient Descent algorithm: AdamOptimizer FTRLOptimizer (default) Stochastic Gradient Descent (SGD) optimizer L1 regularization Strength: This is a numeric value representing a type of regularization that penalizes weights in proportion to the sum of absolute values of the weights. L2 regularization Strength: This is a numeric value representing a type of regularization that penalizes weights in proportion to the sum of squares of the weights. L2 shrinkage regularization Strength: This parameter is applicable to the FtrlOptimizer and denotes a magnitude penalty. The value is greater than or equal to 0. Once the algorithm arguments are set, the last step is to provide the job ID, region, and the resource size in the next workflow step, as follows: 13. Job ID: This is an alphanumeric field that needs to be unique within the project, and it cannot be changed once set. It is recommended to use the project-specific context in the job ID creation as this makes it easier to monitor and troubleshoot in the production environment.

[ 204 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

14. Region: This is the geographical region where the servers that are used for training the models are located. It is recommended to use the same region for the storage of the training data and model training. 15. Scale tier: This defines the resources that need to be allocated to AI Platform for the training job. There are various tiers with pre-configured levels of resources, along with the possibility of customization, as follows: BASIC: This tier provisions a single worker instance. As the name suggests, this is the basic configuration level and cannot be considered for production loads. STANDARD_1: This tier provisions many workers and a limited set of parameter servers. PREMIUM_1: This tier provisions a large number of workers and parameter servers. BASIC_GPU: This tier provisions a single worker instance with the GPU. Once again, this will accelerate the training and is good for experimentation only. BASIC_TPU: This tier provisions a single worker instance with Cloud TPU. CUSTOM: When this configuration is used, the scale of the cluster used for training can be fully configured. The following parameters need to be set for custom configuration: Master type: The type of virtual machine that needs to be used for the master node. This is a required configuration field. Worker count: This defines the count of workers to be used for training. If the number of workers is set to 1 or more, the worker type also needs to be set. Parameter server count: This defines the count of parameter servers to be used for training. If the number of parameter servers is 1 or more, the parameter server type needs to be set.

[ 205 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

All the preceding steps are visually represented in the following screenshot. Let's have a look:

Fig 8.5 - Job settings for the model training

16. Once the job settings are done, the model starts to train, and the job can be tracked in the console. The following screenshot shows ongoing, successful, and failed jobs in the console:

Fig 8.6 - Model training jobs console

[ 206 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

17. Once the model training is completed, it can be deployed from the console. The model is available to be invoked with a new dataset via the Representational State Transfer (REST) API. A unique Model Name needs to be provided along with the Region in which the model needs to be deployed, as well as an optional Description about the model, as shown in the following screenshot:

Fig 8.7 - Model deployment

At this point, the model is trained and deployed for consumption. In the subsequent section, we will look at utilizing the model and building a prediction service.

[ 207 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

Using a custom training application The built-in algorithms are some of the most commonly used algorithms. However, you will be required to train custom models for most of the real-life use cases. The AI Platform provides a standard consistent framework for training custom models. The generic process is depicted in the following screenshot:

Fig 8.8 - Steps in training a custom model on the AI Platform

The AI Platform works in sync with and complements a typical machine learning development life cycle. The model training code accesses the training data from a local folder or a shared location. We need to access the training data from the Cloud Storage location. Once the data is fetched, each individual data instance needs to be processed in various batches. These batches can be decided based on the use case context and volume of the data. The evaluation data is used for testing the accuracy of the model. The trained model gets exported as a binary file and is stored on a local drive or in a Cloud Storage location.

[ 208 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

Additionally, the models can be trained in a distributed manner. The AI Platform makes it a seamless process and requires details about multiple machines to be passed as a parameter for the training cluster. The training service performs the allocation of the resources based on the machine type and selected capacity. Each training job running on an individual node is called a replica. Each replica performs a single role and operates on a specific set of training data. The details of the distribution of the workload are managed by the platform, and no specific user configuration or involvement is required. There are three types of entities that contribute to distributed model training, as follows: Master node: The AI Platform designates one replica as the master node. The distributed training is scheduled on other available nodes, and the master node keeps track of the progress. The overall status of the training job is the same as the status of the master node. Workers: The nodes available in the cluster play the role of workers. The individual worker performs its task and reports the status back to the master node. Parameter server: One of the replica nodes is designated as the parameter server and performs the task of coordination of the shared state of the model in between the nodes. A fundamental and easy strategy for distributed training involves the data being chunked into various segments, whereby each individual data segment is used on a node for training the model. In this case, the parameter server keeps track of the individual gradient values and performs the task of consolidation into the end model state. Once the application is built with one of the available frameworks, it needs to be packaged for deployment on the platform. The packaging can be done with the use of the gcloud command-line interface (CLI). It is recommended to use the gcloud tool to package the application. The package can be built manually by using standard packaging and building tools, as shown in the following code block: gcloud ai-platform jobs submit training $JOB_NAME --staging-bucket $PACKAGE_STAGING_PATH --job-dir $JOB_DIR --package-path $TRAINER_PACKAGE_PATH --module-name $MAIN_TRAINER_MODULE --region $REGION

[ 209 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

Here is a brief explanation of the parameters we need to pass to the script: --staging-bucket: This is a cloud storage location where the training and

dependencies are stored. The GCP project needs to have access to this bucket, and it is recommended that the bucket is in the same region on which the training job is intended to run. --job-dir: This is a cloud storage location where the output files from the training job are stored. This location needs to be in the same region on which the training job is intended to run. --package-path: This is the local path of the parent directory, where the application artifacts are present. The AI Platform packages the contents from this path into a tar.gz file and uploads them to the cloud storage. The training job is executed by unpacking the contents of the archive. --module-name: This is the name of the application. --region: This is the geographical region on which the training job is executed.

Prediction service The machine learning model that was trained in the previous section can be hosted on the cloud for consumption and prediction, based on a new set of data. The prediction service is primarily responsible for managing the storage and compute resources required for the predictions. As a generic process, the training artifacts need to be exported. The export process is the same for the models trained on the AI Platform as well as the models that are trained outside. The maximum model size that can be exported is 250 MB. If a larger custom model needs to be exported, a quota increase form needs to be submitted. The AI Platform limits the use of compute infrastructure so that the platform can be used at a reasonable level, which is sufficient for most general-purpose use cases. However, compute-intensive and large models may require an increase in quota. Machine learning model prediction is an iterative process whereby multiple versions of models need to be trained. The AI Platform references a model as a container for various versions of a specific machine learning pipeline. A model may contain various versions of the pipeline, and a specific version can be invoked by the API. Once the model version is available, the data that needs to be sent to the prediction service should be formatted and made available to the API call for prediction.

[ 210 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

There are two modes in which the predictions can be requested, as follows: Online prediction: The service invokes the model version with the data used for the API call and returns the predictions in the response. In the background, the model version is deployed at runtime on the region specified in the request. The platform may cache the model version, which is more frequently used for quick response times. Batch prediction: Batch prediction is an asynchronous service call that can be used when the use case requirement is such that the predictions are required in bulk and can be processed independently as the service provides the predictions. When the batch prediction is requested, the prediction service allocates resources on the AI Platform for running the job. This may involve one or more prediction nodes. The model graph is restored on each of the allocated nodes. Once the nodes are allocated, the input data is distributed by the master for distributed prediction. Each individual node stores the prediction data in the cloud storage location mentioned while requesting the prediction service. There are fundamental differences between online and batch predictions in terms of their premise, possible use cases, and—hence—the requirements of storage and compute. The goal for online prediction is to minimize the latency (time to respond) of the prediction services, and the predictions are returned in the message body of the response. Batch prediction, on the other hand, works with a goal to handle a large volume of instances with large datasets and complex models. The prediction output is stored in a Cloud Storage bucket instead of sending in the message response body. Online prediction can typically be used in use cases that work on real-time data and requires the predictions in a timely manner for the system to take action. Batch prediction can be used when the predictions need to operate on a large volume of historical data. If we try to run a small prediction load (with less data volume and a simple algorithm) with the batch prediction mode, it takes a longer time compared to the online prediction model. This is due to the fact that the provision of compute and storage resources is done when the request is sent, and the priorities are lower than the online prediction jobs. It is very important to select the right prediction mode for the specific use case. While creating the model for online prediction, the user needs to decide which region to use for running the predictions and must decide whether to enable online prediction logging. Enabling logging is useful for troubleshooting or testing. However, there are additional costs that are incurred when enabling logging. These costs need to be considered up-front before making the online model prediction request.

[ 211 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

The application developer needs to also decide the runtime version to use, along with the version of Python and the machine type to be used for online prediction. The machine type can be decided based on the volume of data and the complexity of the model. There are three levels of logging that can be enabled for the prediction service, as follows: Access logging: This logging helps in analyzing the number of requests made to the prediction service and tracks the timestamps for the request start and render times. An analysis can be done based on the latency values as well as the usage pattern of the prediction service. Stream logging: Standard errors and standard outputs are written to Stackdriver Logging. This setting needs to be enabled carefully simply for the debugging purpose, or else there is a risk of encountering high costs. This logging can only be enabled when creating the model resource. Request-response logging: This level logs online prediction requests and responses into a BigQuery table. These logging levels can be enabled with the gcloud command line, as well as the REST API. In order to enable access logging with gcloud, the --enable logging argument needs to be passed, as follows: gcloud ai-platform models create model_name --regions us-central1 --enablelogging

As shown in the following code block, the --enable-console-logging argument needs to be passed to enable stream logging: gcloud ai-platform models create model_name --regions us-central1 --enableconsole-logging

Request-response logging cannot be enabled with gcloud. It needs to be enabled with the use of the REST API. The logging levels can be set up at the time of creating the model and versions. The projects.models.create method is used for the creation of the model, and it is invoked with the following HTTP URI: POST https://ml.googleapis.com/v1/{parent=projects/*}/models

The parent is a required string URL parameter that represents the project name. In order for the request to be successfully authenticated, we need ml.models.create Google Identity and Access Management (IAM) permission on the project. The request body is a representation of the model object. Here is the schematic representation of the model object: { "name": string,

[ 212 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

"description": string, "defaultVersion": { object (Version) }, "regions": [ string ], "onlinePredictionLogging": boolean, "onlinePredictionConsoleLogging": boolean, "labels": { string: string, ... }, "etag": string }

Let's have a look at the list of model parameters and their descriptions in the following table: Parameter Name

Type

name

String

description

String

defaultVersion

Object

regions[]

onlinePredictionLogging

Required Description This is the name of the model. The model Yes name needs to be unique within the project. This is a description of the model when it No is created. This is the model version that will be used when the version information is not sent within the request. The default version keeps changing as the model evolves and Yes becomes more usable. The change to a model version can be done by using projects.methods.

Array of strings No

This is a list of all the regions on which the model is deployed. This is a provision made for future releases of the AI Platform. At the time of writing this chapter, only one region is supported by the platform. The default is uscentral1.

Boolean

If set to true, the online prediction logs are sent to StackDriver Logging. These logs are similar to the server access logs, where the request timestamps and response times are recorded for all the requests. By default, this parameter's value is false and it needs to be explicitly set to true.

No

[ 213 ]

Implementing TensorFlow Models Using Cloud ML Engine

onlinePredictionConsoleLogging Boolean

No

labels

Key-value pair No map

etag

String

Yes

Chapter 8 If set to true, the online prediction logs are sent to StackDriver Logging in a more verbose manner and contain an entire trail of standard output as well as standard error log messages. This log is helpful for debugging but needs to be used carefully in order to save costs. Labels help in organizing various resources on the AI Platform. These labels are the key-value strings that can have arbitrary values. The labeled resources are easy to monitor and group together. We can supply a maximum of 64 labels. The keys and values can have a maximum of 63 characters. The keys must start with a letter and can contain alphanumeric characters, underscores, and dashes. This flag is used for preventing race conditions between two independent model updates. The etag parameter needs to be used in a chaining manner between various requests and responses to ensure sequential updates to the models. It is recommended to make effective use of etag to avoid inconsistent model versions.

Apart from creating a model, the same request body object can be used for the following functions: delete: Deletes the model get: Gets all the information about the model, including the versions getIamPolicy: Gets the access control policy of a resource list: Provides a list of all the models present in the project patch: Updates a model resource setIamPolicy: Sets the access control for a specific resource within the AI

Platform testIamPermissions: Lists all the permissions the calling user account has on the specific resource In order to perform online predictions, the input data needs to be structured and formatted in the form of a list of values or as a JSON object. The following is an example of an input tensor that can be passed to the Tensor model on the AI Platform: {"values":["one","two","three"], "key":123}

[ 214 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

The object needs to be formatted as follows for sending to the REST API endpoint: {"instances": [ {"values": ["one","two","three"], "key": 1}, {"values": ["five","six","seven"], "key": 2} ]} def predict(project, model, instances, version=None): GOOGLE_APPLICATION_CREDENTIALS= service = googleapiclient.discovery.build('ml', 'v1') name = 'projects/{}/models/{}'.format(project, model) if version is not None: name += '/versions/{}'.format(version) response = service.projects().predict( name=name, body={'instances': instances} ).execute() if 'error' in response: raise RuntimeError(response['error']) return response['predictions']

As we can see, the online prediction can be obtained by using a reference to the project to model the JSON-formatted input structured data. We create an instance of the machine learning service using googleapiclient, and call the predict method on the service instance.

Notebooks The AI Platform provides an intuitive way of working with it in terms of notebooks. These are pre-packaged online interfaces that can be used for effective collaboration among team members and can be quickly configured to have the latest versions of Python libraries in the backend. The notebooks enable the developers to create and manage virtual machines on the GCP that can utilize TensorFlow and PyTorch frameworks, along with R and Python deep learning packages.

[ 215 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

The workflows and pipelines can be configured to utilize CPUs as well as GPUs out of the box in a truly serverless manner. The images that can be used with the notebooks are tested and optimized for best performance and ease of use. The authentication layer of GCP is utilized for access to the notebooks, and IAM policies can be configured in the same way as any other GCP resources. There is seamless integration with the GitHub repository hosted on the GCP. The AI Platform notebooks support the following runtimes and packages:

Fig 8.9 - AI Platform-supported runtimes and packages

In order to work with the notebooks on the AI Platform, a project needs to be selected, and the Compute Engine API needs to be enabled to navigate to the notebooks page. Let's create a new instance of the notebook: 1. Go to the Notebooks side menu within the AI Platform from the navigation menu. 2. Click on the NEW INSTANCE link on the top menu bar. 3. Select from within the available options for creation of the instance or click on Customize instance for granular control of the various parameters and capacities for the new instance. 4. Select an instance with or without GPU. If the instance is created with GPU, select the option to install the GPU driver automatically. The GPU count can be modified at a later point, once the instance is created. The steps for creating a new instance of the notebook are depicted in the following screenshot:

[ 216 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

Fig 8.10 - Steps in the creation of a new notebook instance (1)

Let's create a new notebook instance with TensorFlow 2.0 without GPUs. 5. The AI Platform allocates a default instance name. This name can be modified as per the use case context. It is recommended to use the date timestamp for the instance name for better maintainability. 6. Environment: The AI Platform creates an image with TensorFlow 2.0 with Intel® MKL-DNN and CUDA 10.0, and includes Python 2/3, scikit-learn, pandas, and NLTK by default. 7. Machine configurations: The AI Platform creates the machine with a default region (us-west1-b), 4 vCPUs with 15 GB RAM, and a boot disk of 100 GB. 8. Networking: The Subnetwork is default (10.138.0.0/20). 9. Estimated cost: The AI Platform provides a ballpark cost estimate based on a sustained use discount. 10. Click on the CREATE button to allocate the notebook instance.

[ 217 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

The following screenshot shows the steps to create a new notebook instance with TensorFlow 2.0 without GPUs:

Fig 8.11 - Steps in the creation of a new notebook instance (2)

11. Once the notebook is created with the set configuration, you can open the JupyterLab interface by clicking the OPEN JUPYTERLAB hyperlink, as shown in the following screenshot:

[ 218 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

Fig 8.12 - Opening the Jupyter Notebook

Data Labeling Service The AI Platform provides a Data Labeling Service that makes it easy and efficient to annotate the training data with the help of human labelers. A large volume of data is required for model training and, at times, it is impossible to get access to human efforts for labeling the training and evaluation data. The Data Labeling Service can be leveraged for continuous evaluation, which helps in evolving the model for more accuracy, based on a new set of input data. A generic process for using the Data Labeling Service starts with the creation of a dataset containing samples that act as a guide to the human labelers. Along with the sample dataset, an annotation specification set needs to be provided that contains all possible classification categories for the training data within the use case context. We can also provide an additional set of instructions to the labelers for performing actual labeling. Once the prerequisites are met, a service request can be generated based on the samples, annotation specification set, and the instructions, and the human labelers export the training data for use in training and evaluation of the models.

[ 219 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

The following screenshot shows the generic process for utilizing the Data Labeling Service:

Fig 8.13 - Steps involved in utilizing the Data Labeling Service

Deep learning containers These containers provide an abstract application layer that runs across various environments and works seamlessly with the underlying operating systems. This helps developers to focus on application development since all the dependencies are managed by the container irrespective of the environment on which the application is deployed. The AI Platform provides an out-of-the-box deep learning container that contains key data science frameworks, libraries, and tools. The deep learning containers on the AI Platform contain TensorFlow, PyTorch, scikit-learn, and R frameworks, along with Python and R runtimes. The most commonly used Python packages are also included in the containers.

[ 220 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

Steps involved in training and utilizing a TensorFlow model In this section, we will walk through all the steps involved in training a TensorFlow model on the AI Platform. The core components for a local machine learning pipeline as well as on the AI Platform are similar and, hence, it is very easy for the application developers to utilize the familiar process for building and deploying models on the AI Platform. The flowchart in the following screenshot represents the generic steps in training and utilizing a TensorFlow model on the AI Platform:

Fig 8.14 - Generic steps in training and utilizing a TensorFlow model on the AI Platform

Prerequisites For training, deploying, and utilizing TensorFlow models on the AI Platform, we need to consider the costs incurred for the following components: AI Platform: Training/deployment/prediction Cloud Storage: Input data for training/staging the application package/training artifacts The primary requirement for training and deploying a TensorFlow model is to create a project. Once the project is created and billing is enabled, we need to enable the AI Platform and Compute Engine APIs. At this point, install the cloud Software Development Kit (SDK) and initialize it with authentication and authorization.

[ 221 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

With this, the platform-related prerequisites are complete. We now need to set up the environment for the application to run. From the GCP console, click on the Activate Cloud Shell button on the top menu bar. This will open the Cloud Shell web interface. Here are the steps we need to take from Cloud Shell: 1. List all the models present on the AI Platform. 2. Update all the components on the AI Platform. 3. Install or upgrade TensorFlow. You can see the preceding steps highlighted in the following screenshot of the console:

Fig 8.15 - AI Platform setup requirements through the GCP console

At this point, the environment setup is complete. Now, we can develop and validate the training application locally. Before running the application on the cloud, it is recommended to run it locally for quick debugging and to ensure all the pieces are working as expected. There are no charges incurred for the cloud resources when the applications are run locally.

[ 222 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

Creating a TensorFlow application and running it locally The training application is structured based on the developer's preferences and the context of the project. However, there is a recommended project structure, which ensures consistency across various projects and eliminates the need for contextual developer training.

Project structure recommendation Let's set up the project structure: 1. Create a main project directory that contains all the code for the application. 2. Create a setup.py file inside the main project directory. The setup.py file ensures that all the subdirectories are included in the archive package that is used for application distribution. The following code block shows a typical implementation of setup.py: from setuptools import setup, find_packages import os NAME = 'preprocessing' VERSION = 'x.y' REQUIRED_PACKAGES = [COMMA_SEPARATED LIST OF PACKAGES] setup( name=NAME, version=VERSION, packages=find_packages(), install_requires=REQUIRED_PACKAGES, )

3. Create a subdirectory named trainer. The trainer directory contains the application module code and usually contains the following files: task.py: This is the main application module and the file containing the application logic that is responsible for the orchestration of the training job. model.py: This is the file containing the model logic and the various attributes and configuration parameters for the model. 4. Create various subdirectories that are required for the application to be modular and logically readable.

[ 223 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

5. It is recommended to create an __init__.py file in every subdirectory. Typically, these are used by Setuptools as markups for packaging the application. 6. When the gcloud command is called for training, the package-path parameter is set to the trainer directory. The runtime searches for the setup.py file in the parent directory and trains the model based on the code in the task.py and model.py files within the trainer directory.

Training data For this illustration, we will use Census Income Data, which is one of the publicly available datasets. Here are the characteristics of the dataset: Summary: Dataset to predict whether the income of a person is less or greater than 50,000 US dollars, based on 14 attributes. Type: Multivariate. Number of instances: 48,842. Area: Social. Attribute types: Categorical/Integer. Let's proceed with model training. 1. Open Cloud Shell and download the sample code, with the help of the following command: wget https://github.com/GoogleCloudPlatform/cloudml-samples/archive/mast er.zip unzip master.zip cd cloudml-samples-master/census/estimator

2. Get the training data, with the help of the following command: mkdir data gsutil -m cp gs://cloud-samples-data/ai-platform/census/data/* data/ TRAIN_DATA=$(pwd)/data/adult.data.csv EVAL_DATA=$(pwd)/data/adult.test.csv

[ 224 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

3. Create an output directory for storing the models and intermediate files. 4. In order to run the model locally, we need to use the following command: gcloud ai-platform local train \ --module-name trainer.task \ --package-path trainer/ \ --job-dir $MODEL_DIR \ -- \ --train-files $TRAIN_DATA \ --eval-files $EVAL_DATA \ --train-steps 1000 \ --eval-steps 100

Here is a screenshot of the model output directory:

Fig 8.16 - Model output directory

Once the model is trained and evaluated, various model training parameters can be analyzed using TensorBoard. TensorBoard is a visualization toolkit packaged with TensorFlow. It helps in visualizing and analyzing loss and accuracy, in a model graph. The following command is used for starting TensorBoard as a web application (on port 8080, by default): tensorboard --logdir=$MODEL_DIR --port=8080

[ 225 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

The preceding command will return the following output. Let's have a look:

Fig 8.17 Model view with TensorBoard

As we can from the preceding screenshot, TensorBoard makes it easy to analyze the model performance with intuitive visualizations. In the next section, let's look at the steps involved in packaging and deploying the training application.

Packaging and deploying your training application in Cloud ML Engine It is important to understand the right way of packaging and deploying an application in ML Engine. In this section, we will talk about some of the recommended ways and best practices in the packaging and deployment of machine learning applications. We can use the gcloud command-line tool for packaging and uploading the applications. The easiest way is to use the following command for packaging as well as uploading the application, and, at the same time, submitting a training job: gcloud ai-platform jobs submit training

[ 226 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

Let's define the environment variables that are required globally for packaging and deploying the applications, as follows: PATH_TRAINER_PACKAGE: A fully qualified path to all the source code files that

are required by the training job. TRAINER_MODULE: This is the name of the .task file that represents the main application module and the file containing application logic that is responsible for the orchestration of the training job. STAGING_BUCKET: This is the path on Google Storage that is used by the training job runtime for storing the intermediate results as well as temporary files. In addition to these, we need to set the variables for representing job name, job directory, and region. The job name is useful in tracing the job-related data and lineage points; the job directory stores intermediate and final results of the training job; and the region is required to run the training job at an appropriate location, in order to optimize computation and storage costs and minimize overheads. Here is a command that takes care of packaging and deploying an application with the gcloud ai-platform command: gcloud ai-platform jobs submit training $JOB_NAME \ --staging-bucket $STAGING_BUCKET \ --job-dir $JOB_DIR \ --package-path $PATH_TRAINER_PACKAGE \ --module-name $TRAINER_MODULE \ --region $REGION \ -- \ --user_first_arg=first_arg_value \ --user_second_arg=second_arg_value

In addition to the mandatory parameters for running the command, we can pass a number of user-defined and application-specific parameters to this script. The parameter values are available for the runtime and the implementation of application-specific logic.

[ 227 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

The training job may also require some dependencies for successfully running the training job. There are two types of dependencies that need to be resolved: Standard Python dependencies: These are the standard Python packages available on PyPI. The standard dependencies are installed with the pip install command by the AI Platform. This is similar to the dependency resolution for the standalone applications. A standard way of defining the dependencies on the AI Platform is to mention those in the setup.py file. The setup.py file needs to be placed in the root directory of the application. The template for the setup.py file can be seen in the following code snippet: from setuptools import find_packages from setuptools import setup REQUIRED_PACKAGES = ['comma separated list of required packages'] setup( name='trainer', version='0.1', install_requires=REQUIRED_PACKAGES, packages=find_packages(), include_package_data=True, description='Setup details and required packages for the training application' )

User-defined and custom dependencies: These are the user-defined packages that are required by the application at runtime. The AI Platform can resolve these dependencies with the pip install command. The custom dependency packages need to be accessible to the application at runtime and, hence, a fully qualified URI for the package needs to be provided as a parameter to the script. It is recommended that the package files are stored in an accessible Cloud Storage location. When using the gcloud command, the dependencies can be placed on the local machine as well as on Cloud Storage. The AI Platform stages these dependencies in the same sequence in which they appear in the command. Multiple dependencies need to be specified as a comma-separated list. In the next section, we will learn how to optimize the training job by choosing the right compute options and the runtime parameters.

[ 228 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

Choosing the right compute options for your training job It is important to choose the right compute options for a training job in order to optimally utilize the platform resources. This results in minimizing the training time and the cost. We need to set runtime attributes for the training job. A training job is a standard object on the AI Platform. The structure of the training job is as follows (input parameters are highlighted in bold font—please find the complete configuration at this link: https:/​/ github.​com/​PacktPublishing/​Hands-​On-​Artificial-​Intelligence-​on-​Google-​CloudPlatform): { "jobId": string, the job "createTime": string, was created "labels": { be used for organizing string: string, ... }, "trainingInput": { for the training job. object (TrainingInput) }, "predictionInput": { for the prediction job. object (PredictionInput) } }

//Required: user defined identifier for //Output Parameter: indicates when a job //Optional Input Parameter: recommended to //and troubleshooting the run-time jobs.

//Required: specifies the input parameters

//Required: specifies the input parameters

Specifically, we need to fill the TrainingInput or PredictionInput resource for a runtime job configuration. These are mutually exclusive for a specific request, and only one of these input parameters needs to be used at runtime. Let's have a look at the following JSON structure of the TrainingInput parameter in detail (please find the complete configuration at this link: https:/​/​github.​com/​PacktPublishing/​Hands-​On-​ArtificialIntelligence-​on-​Google-​Cloud-​Platform): { "scaleTier": enum (ScaleTier), //Required: specifies machine types, count of replicas, workers,parameter servers "packageUris": [ //Required: These are the Google Cloud Storage locations for string // the packages containing the training

[ 229 ]

Implementing TensorFlow Models Using Cloud ML Engine program along with ], "pythonModule": string, after importing all the "args": [ dependencies string ], "hyperparameters": { to be tuned. object (HyperparameterSpec) }, "region": string, which the training job will run }

Chapter 8

// additional dependencies //Required: The python module to run //packages and resolving the

//Optional: The set of hyper-parameters

//Required: The compute engine region on

We will take a look at the ScaleTier and HyperParameterSpec objects in detail. Before that, let's understand the JSON structure of the PredictionInput object that is used while submitting the prediction job, as shown in the following code block: { "dataFormat": enum (DataFormat), data file (JSON, TEXT, etc.) "outputDataFormat": enum (DataFormat), data file (default-JSON) "inputPaths": [ location of input data files string ], "maxWorkerCount": string, workers (default-10) "region": string, region "runtimeVersion": string, version "batchSize": string, batch (default-64) "signatureName": string, defined in the saved model "modelName": string, "versionName": string, "uri": string "outputPath": string }

[ 230 ]

//Required: Format of the input //Optional: Format of the output //Required: Cloud storage

//Optional: Maximum number of //Required: Google Compute Engine //Optional: AI Platform run-time //Optional: Number of records per //Optional: Name of signature

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

The training and prediction performance is optimized a great deal when the right parameters are chosen. We need to understand various scale tiers and hyperparameters for further optimizing the performance and cost. It is important to select the appropriate scale tiers based on the volume of the training data and the complexity of the algorithm. The idea is to use just the right amount of resources for training and prediction. This helps in minimizing the cost of training units. The advanced tiers have additional capacity in terms of the number of CPU cores and utilization of GPU, as desired. However, the cost also increases, along with the advancement of the tier. There are various scale tiers available on GCP, as follows: BASIC: Provides a single worker instance and is suitable for learning and

experimenting. This can also be used for a small-size proof of concept (POC). STANDARD_1: Provides a number of worker nodes and only a few parameter servers. PREMIUM_1: Provides a large number of workers, with many parameter servers. BASIC_GPU: Provides a single worker instance with a GPU. CUSTOM: This tier allows for the setting of custom values for the master type, worker count, parameter server count, and the parameter server type. These parameters from TrainingInput become mandatory when the CUSTOM scale tier is selected. Apart from the scale tier, we need to carefully select the hyperparameter values to further optimize training performance.

Choosing the hyperparameters for the training job The hyperparameters are controlled by means of the HyperparameterSpec object within the TrainingInput object. The following code block shows the structure of the HyperParameterSpec object: { "goal": enum (GoalType), tuning [MAXIMIZE/MINIMIZE] "params": [ be tuned { object (ParameterSpec) } ], "maxTrials": number,

//Required: The type of goal used for //Required: The set of parameters to

//Optional: Number of trials to be

[ 231 ]

Implementing TensorFlow Models Using Cloud ML Engine attempted (default-1) "maxParallelTrials": number, //Optional: May reduce quality of optimization "maxFailedTrials": number, //Optional: before the hyper parameter tuning job is failed "hyperparameterMetricTag": string, //Optional: name for optimizing trials. "resumePreviousJobId": string, "enableTrialEarlyStopping": boolean, "algorithm": enum (Algorithm) //Optional: the hyper parameter tuning job } // GRID_SEARCH / RANDOM_SEARCH

Chapter 8 Number of parallel trials. // Number of failed trials // TensorFlow summary tag

Search algorithm used by ALGORITHM_UNSPECIFIED /

While cost is one of the fundamental considerations when choosing the compute resources, we need to also understand that there are limitations imposed by the platform on the usage of the training resources, which are set by the quota for various operations. Let's briefly look at the quota limits imposed by the AI Platform. Due to the inherent nature of a cloud platform of multi-tenancy, the resources used by a specific user and project need to be restricted and governed by quotas in order to prevent over-utilization of the resources by mistake. The AI Platform also imposes some quota limits based on the service request. Any user account allocated to a project can only initiate a certain number of individual API requests every minute. The limit is applicable to a particular API or a group of APIs, as follows: Job creation requests: A maximum of 60 requests can be made within 1 minute. Prediction requests: A maximum of 6,000 requests can be made within 1 minute. Combined number of requests for the following sub-API calls for resource management: A maximum of 300 requests per minute: list/get (projects.jobs, projects.models, projects.models.versions, projects.operations) delete/create (projects.models, projects.models.versions) cancel (projects.jobs, projects.operations) setDefault In addition to the service requests, there is a maximum limit of 100 models per project, and each model can have a maximum of 200 versions.

[ 232 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

In certain cases, it is not possible to create a production version of the AI application with these limits. GCP allows for a quota increase to be requested via the administration console. Additionally, the AI Platform restricts the number of concurrent usages of the virtual machines, as follows: Number of concurrent prediction nodes: 72 Number of concurrent vCPUs running in prediction nodes: 450 The limits on GPUs for training are as follows: GPU Type Tesla K80 Tesla P4 Tesla P100 Tesla V100 Tesla T4

Concurrent GCPs (Training) 30 8 30 8 6

Concurrent GPUs (Prediction) 30 2 30 8 6

We need to choose from the available options carefully in order to optimize performance and cost. In the next section, we will take a look at how to monitor TensorFlow model jobs on GCP.

Monitoring your TensorFlow training model jobs The time required for the model training jobs is in proportion to the volume of training data and the complexity of the models that are trained. It is imperative that the applications are able to report the status of training jobs. The AI Platform provides three primary ways to monitor training jobs. The GCP console provides the user interface to list the training jobs. The gcloud CLI and custom Python code can be used to get the status of the training jobs. Here are the gcloud commands for monitoring training jobs: gcloud ai-platform jobs describe job_name

This command returns the status of the current running job, indicated with the job_name parameter, as follows: gcloud ai-platform jobs list --limit=5

This command returns a list of all the jobs currently running on the platform for the project.

[ 233 ]

Implementing TensorFlow Models Using Cloud ML Engine

Chapter 8

Summary In this chapter, we have seen how to leverage serverless machine learning on GCP with the help of Cloud ML Engine (AI Platform). We have seen the various ways in which the AI Platform can be accessed, along with building a TensorFlow application and packaging and deploying the models. We also looked at best practices for organizing the applications on the AI Platform, optimizing performance with the use of the right level of infrastructure while saving costs. Finally, we learned how to monitor the application with the commandline tool. In the next chapter, we will build a prediction application using TensorFlow models and take a practical hands-on approach with AI application development on GCP.

[ 234 ]

9 Building Prediction Applications Prediction in the cloud is about developing a prediction machine learning model, deploying the finalized version of the machine learning model, and using that deployed model to predict target values out of newly arriving data using cloud-native services or infrastructure. Simply speaking, the cloud manages infrastructures and high-level abstraction of the machine learning framework that can be easily used to train or use your machine learning models. Like any other cloud provider, Google Cloud also provides features to run machine learning-based model prediction using its native services. In this chapter, we will look into some of the steps involved in performing prediction using Google Cloud services, as follows: Overview of machine-based intelligent predictions Maintaining models and their versions Taking a deep dive into saved models Deploying the models on Google Cloud Platform (GCP) Model training example Performing prediction with service endpoints

Overview of machine-based intelligent predictions Predictive analytics is a big data enabler: organizations gather large amounts of real-time customer data, and predictive analytics uses this historical data, combined with consumer intuition, to predict future events. Predictive analytics allows organizations to use data (both historical and in real time) to shift from a historical perspective to a customer's forward-looking perspective. Predictive analytics allows companies to become proactive and forward-looking, predicting data-based results and actions rather than assumptions.

Building Prediction Applications

Chapter 9

Prescriptive analytics is a next step in recommending actions to make use of the forecast and also offering choices for decision making, to benefit from the predictions and their consequences. The predictions can be made with the services deployed on the cloud. These services can be exposed as the APIs, which are easy to use, and make it easy for the analysts to use the prediction service, without needing to fully understand the details of the underlying algorithms. GCP components make it even easier to build, deploy, and leverage prediction services with less effort and cost.

Understanding the prediction process The following diagram depicts the high-level steps involved in using prediction on GCP (these steps are explained in detail in the next paragraphs and sections):

Figure 9.1: Steps in using prediction on GCP

The first step to start the prediction process is to export models as artifacts after you have finished training your models. You are required to export your qualified machine learning model as one or several artifacts in order to fulfill predictions from the Google Cloud AI Platform. This chapter will explain how the qualified AI system prediction models can be exported on GCP. There are multiple options to export your models, depending upon the type of machine learning frameworks you have used to build the models. For example, if you have used TensorFlow to train your machine learning model, then you can use the tf.save_model API to save the model first, and then use the Google Cloud AI Platform to export the model.

[ 236 ]

Building Prediction Applications

Chapter 9

The next step is to deploy the models. The Google Cloud AI Platform will host the models to provide you with cloud predictions. Model deployment is the method for hosting a saved model file. The cloud prediction provider handles your model infrastructure and makes it open to requests for online and batch prediction. Model deployment is the most important aspect of the prediction process, and understanding it in detail is mandatory. We will cover model deployment aspects in detail in the next sections. Once your model is deployed, you use your models for online prediction or batch prediction. Batch prediction is helpful if you want to produce predictions for a set of observations at a time and then act on a certain number or percentage of the observations. Normally, for such a request, you have no low-latency requirement. These forecasts are then stored in a database and can be accessed by developers or end users. Batch inferences can sometimes use big data technologies—such as Spark—to produce forecasts. The batch inference technology specifications are simpler than those for online inference. For example, data scientists can simply deserialize a trained model on the machine that performs the batch inference job and not expose a trained model via a Representational State Transfer (REST) API. The predictions made during batch inference can also be analyzed and processed afterward before end stakeholders see them. The following simple diagram represents how batch prediction works:

Figure 9.2: Batch predictions

[ 237 ]

Building Prediction Applications

Chapter 9

Online predictions are the method of creating predictions for machine learning in real time on request. This is also called real-time or dynamic inference. These predictions are typically generated by a single data observation at runtime. Online inference predictions can be produced at any time of day. Online inference allows us to use machine models in real time. It opens up a completely new technology field that can take advantage of machine learning. Instead of waiting hours or days for batch predictions, we can produce predictions as soon as they are required, and serve them immediately. Online deduction also helps us to quickly analyze new data without delay. In general, online inference is more challenging than batch inference. Online inference tends to be more complex because of the addition of tools and systems needed to satisfy latency requirements. A program that needs a forecast to be executed within a few milliseconds is much tougher than a system with a 24-hour service-level agreement (SLA). The following diagram represents how online prediction works:

Figure 9.3: Online predictions

If you use a simple model and a small number of input cases, the time it takes to complete the online forecast and the batch analysis for similar prevention requests will be significantly different. It could take a long time to complete predictions that an online application returns almost instantly. The specific infrastructures used in the two prediction approaches have a side effect. Once you submit the order, the AI Platform assigns and initializes batch prediction resources.

[ 238 ]

Building Prediction Applications

Chapter 9

In general, an online prediction is ready for processing on request. The following table summarizes the difference between online and batch prediction. Batch Predictions Suitable for voluminous big data-size volumes with complex compute requirements. The output is a file format with more than one prediction outcome.

Online Predictions Suitable for one record at a time with low latency and with less complex models. The output is one response at a time, mostly in JSON format. Synchronous real-time request and response.

Asynchronous request and response. It can use models that are stored in some external storage such as a Google Cloud storage bucket, or deployed using the Google Cloud AI Platform. Limited use for certain instance types only if using models deployed via the Google Cloud AI Platform.

It can only use models that are deployed using the Google Cloud AI Platform. Provisions are there to use different types of compute instances for running predictions.

Maintaining models and their versions The Google Cloud AI Platform lets you organize your machine learning solutions as models and versions. Models are the name given to your machine learning solution, and the version uniquely identifies the deployed model as an artifact. For example, you can create a model called ProductCategoryIdentification that represents a machine learning solution for classifying products. In machine learning, a model is a solution to a problem to be solved. In other words, this is the recipe for predicting a data value. A model is a conceptual container for each iteration of this AI/machine learning solution on the Google Cloud AI Platform. For instance, the problem you want to address is to forecast the selling price of homes, given a set of data relating to previous sales. You construct an AI application model called housing prices and attempt to solve the problem by means of several machine learning techniques. You can deploy versions of that model at each level. That version can vary completely from the others, but if that suits your workflow, you can arrange it under the same model.

[ 239 ]

Building Prediction Applications

Chapter 9

The following screenshot represents how a model can be created on the Google Cloud AI Platform:

Figure 9.4: New model creation

[ 240 ]

Building Prediction Applications

Chapter 9

As you can see, the model creation user interface is consistent with other services on the GCP with which the user is familiar. We need to provide a case-sensitive model name and region as mandatory attributes. It is recommended to provide the optional description for better maintainability of the models. This is especially important as we build and deploy a large number of models for various use cases. The following screenshot shows a list of models that are available on the platform to be used via the user interface or the API:

Figure 9.5: Model listing

Now, after the model is created, you can create versions of it. A version represents which machine learning code and framework was used to train the models. Sometimes, data scientists use a different framework, such as XGBoost or TensorFlow, to devise a solution to the same problem. They create different versions to identify artifacts. So, a model version, or just a version, is an instance of a machine learning solution stored in the AI Platform model service. You make a version by transferring a trained serial model (as a saved model) to a service. You can also provide custom code (still in the beta phase at the time of writing) to manage forecasts when making a version. Every model with a minimum of one version has a default version; when the first version is generated, the default is set. The AI Platform uses the default version for this model if you request predictions that specify just a model name.

[ 241 ]

Building Prediction Applications

Chapter 9

The following screenshot shows how to manage the versions:

Figure 9.6: Creating a model version

Some of the rules for naming models and versions are as follows: The name should contain only letters (mixed-case), numbers, and underscores. The name should begin with a letter. The name should contain a maximum of 128 characters. The name should be unique in a project (if it is a model name) and unique in a model (if it is a version name).

[ 242 ]

Building Prediction Applications

Chapter 9

You can also add custom packages required for custom built-in code, as shown in the following screenshot:

Figure 9.7: Adding custom code and packages

[ 243 ]

Building Prediction Applications

Chapter 9

The final output screen would look like the one in the following screenshot:

Figure 9.8: Model versions

As a good practice, model names should be concise and distinctive, as you will need to select them in logs or reports from lists of several names. Short and simple version names are best for maintenance.

Taking a deep dive into saved models You are required to export (or to save) your qualified machine learning model as one or more objects in order to perform Google Cloud AI Platform predictions. This section explains various ways to export qualified models for AI system prediction deployment. Depending upon the type of machine learning platform that you use, the Google Cloud AI Platform gives you multiple ways to export those models. These methods are specific to the framework that is used to train the model. We will cover some aspects of the TensorFlow saved model in this section. You must first export your trained models to the Google Cloud AI Platform in the format of a TensorFlow SavedModel to support the predictions. A saved TensorFlow model actually contains a TensorFlow program with weights and computations. It does not point to the code used to build the model, but the code required to make predictions with derived mathematical parameters. As it is called the SavedModel, TensorFlow is the recommended format to deploy trained models in the Google Cloud AI Platform. It is recommended to export the TensorFlow SavedModel.

[ 244 ]

Building Prediction Applications

Chapter 9

Exporting your trained SavedModel model saves the training graph in a Google Cloud AI Platform-specific format that can be used and restored for forecasts using its metadata. SavedModel offers a language-neutral format for saving recoverable and hermetic machine-learned models. It helps systems and resources of higher levels to generate, consume, and process TensorFlow models. The following are some of the important features of a SavedModel: A single SavedModel can be added to multiple graphs with a single set of variables and assets. Every diagram is connected to a set of tags for identification while loading or restoring. Graphs are stored in a protocol buffers format. SavedModel has support for SignatureDef (the technical name for a protocol buffer message). SavedModel uses this to provide support for signatures that are stored with the graphs. These graphs used for machine learning prediction work typically containing a bunch of data inputs and data outputs are called Signatures. A Signature has the following structure: inputs: Corresponding data inputs used to train TensorFlow models stored a map list of strings. outputs: Corresponding data outputs used to train TensorFlow models stored a map list of strings. method_name: Supported methods named used in the TensorFlow framework. SavedModel has support for assets as well. SavedModel uses assets if operations depend on external initialization files, such as vocabulary. Assets are copied to the SavedModel directory and read when a particular metagraph def is loaded. It supports clear devices before the SavedModel is generated. As the preceding section mentioned, although some of the important features are supported by SavedModels, other features are not supported. These are as follows: It does not have support for implicit versioning. It does not support garbage collection by itself. External tools that use these models may have support for these. It does not support atomic writes to TensorFlow SavedModels.

[ 245 ]

Building Prediction Applications

Chapter 9

The SavedModel directory has the following structure: assets/ assets.extra/ variables/ variables.data-?????-of-????? variables.index saved_model.pb

The saved_model.pb or saved_model.pbtxt file is the protocol buffer file that includes all the graph definitions as a MetaGraphDef protocol buffer message. The assets subfolder contains supporting auxiliary files, such as text vocabulary files. The assets.extra subfolder contains user-added assets that can co-exist with the model, but this is not loaded at runtime. The user or end developer has to manage it by itself as the TensorFlow libraries do not manage it. The variables subfolder contains output from the TensorFlow library TensorFlow saver.

SignatureDef in the TensorFlow SavedModel A SignatureDef determines the signature of a TensorFlow graph-assisted calculation. SignatureDefs are aimed at providing generic support for defining inputs and outputs of a function. TF-Exporter and SessionBundle used Signatures, which were similar in concept, but allowed users to differentiate between named signatures and default signatures so that they could be correctly retrieved upon loading. For those who previously used TFExporter/SessionBundle, Signatures in TF-Exporter will be replaced by SignatureDefs in SavedModel. TensorFlow Serving offers high-level APIs for deduction. To allow these APIs, the models should include one or more SignatureDefs, which describe the exact TensorFlow nodes for input and output. Refer to the following examples of SignatureDefs that support TensorFlow Serving for each API.

[ 246 ]

Building Prediction Applications

Chapter 9

Classification SignatureDefs support the standardized calls to the Classification API of TensorFlow Serving. These specify that there should be a Tensor input and that there are two possible Tensor outputs: classes and ratings, with at least one needing to be present. The following screenshot shows the structure of classification SignatureDefs:

Figure 9.9: Classification message API payload

[ 247 ]

Building Prediction Applications

Chapter 9

Predict SignatureDefs support calls to the Predict API of TensorFlow Serving. Such signatures allow you to arbitrarily support multiple inputs and output tensors. For the following example, my predict signature has a specific logical Tensor image that is mapped in your graph x:0 to the actual Tensor. Predict SignatureDefs allow model-to-model portability. This means you can deal with different underlying Tensor names in different SavedModels (for example, you may have an alternative model with a Tensor Z:0 instead of x:0), so your customers will continue to test old and new models on the web without changing customer sides. Predict SignatureDefs allow you to add additional tensors to the outputs, which you can request explicitly. Let's say that you also want to fetch a pile layer for debugging or other purposes in addition to the output key below scores. In this case, just add an additional tensor with a pool key and a suitable value. The following screenshot shows the structure of Predict SignatureDefs:

Figure 9.10: Classification API response

[ 248 ]

Building Prediction Applications

Chapter 9

Regression SignatureDefs support structured calls to the TensorFlow Serving Regression API, which requires exactly one tensor input and one tensor output. The following screenshot shows the structure of Regression SignatureDefs:

Figure 9.11: Structure of Regression SignatureDefs

[ 249 ]

Building Prediction Applications

Chapter 9

TensorFlow SavedModel APIs As shown in the following diagram, there are two major APIs associated with TensorFlow SavedModel. One is the Builder API, and the other is the Loader API. Once the Loader API has loaded the SavedModel, it will be used for prediction:

Figure 9.12: Tensorflow SavedModel API

The SavedModelBuilder class provides multiple meta graph definitions, related variables, and properties to be saved. The first meta graph must be saved with variables in order to build a SavedModel. The following meta graphs are easily saved with their graph descriptions. When properties must be saved and written or copied to the disk, they can be supplied when inserting a meta-gram def. If several meta graph definitions are connected to an asset of the same name, only the first version will be maintained. Each SavedModel meta graph must be annotated with tags that reflect meta graph capabilities and casespecific user tags. Such tags generally include a meta graph with its attributes (for example, service or training) and likely hardware-specific aspects, such as GPU. The def meta graph in the SavedModel with a tag set that exactly matches that in the Loader API is the one that the loader loads. If no def meta graph matches the tags listed, an error will be returned. For example, a loader requiring GPU to serve could load only a meta graph annotated with tags=serve, GPU, by specifying tags in tensorflow::LoadSavedModel(...). The code in the following screenshot represents how the Builder API can be used:

[ 250 ]

Building Prediction Applications

Chapter 9

Figure 9.13: API Builder code

With the SavedModelBuilder package, users can test whether default-assessed attributes have to be removed from the NodeDefs when adding a SavedModel bundle meta graph. Both the SavedModelBuilder.add meta gram and variables and the SavedModelbuilder.add meta graph accept a strip default attribute boolean flag controlling this behavior. The model producers can remove any valued default attributes in NodeDefs by setting the strip default attrs to True. This helps to ensure that new attributes with defaults do not cause older model consumers to fail to load models that have been regenerated with new training binaries. The Loader APIs are implemented in C++ and Python. The SavedModel loader version of Python includes a SavedModel loading and restore feature. The load feature includes the session in which the description of the graph and the variables are restored, the tags used to define the meta graph loading default, and the location of SavedModel. The subset of variables and assets provided in a particular meta graph def is restored to the session supplied when loaded. Representative Python code for Loader APIs would look like the following:

Figure 9.14: API Loader code

[ 251 ]

Building Prediction Applications

Chapter 9

The SavedModel loader version of C++ provides an API for loading a SavedModel from a path, allowing SessionOptions and RunOptions. Compared to the Python edition, the C++ version requires the loading and defining of tags associated with the graph. This version of SavedModel loaded is called SavedModelBundle and includes the def meta graph and the loading session. Representative C++ code for Loader APIs would appear as follows:

Figure 9.15: C++ implementation of Loader API code

SavedModel provides flexibility for building and loading TensorFlow graphs for a variety of applications. SavedModel APIs provide a number of constants in Python and C++ that are easy to use and share consistently across tools for the most common expected applications.

Deploying the models on GCP To deploy your machine learning models after you have exported them, you have to deploy exported models. The first step of deploying your models is to store them on a Google Cloud Storage bucket. In general, a dedicated cloud storage bucket is easier to use for the same project for the Google Cloud AI Platform. If you use a bucket from a different project, you need to make sure you have access to your cloud storage model in your Google Cloud AI Platform service account. Your attempt to build a Google Cloud AI Platform model version will fail without the required permissions. Let's begin by looking into how to create a Google Cloud Storage bucket. An existing bucket can be used, but it must be in the same area where you plan to work on the Google Cloud AI Platform. The following code can help you create a new bucket: BUCKET_NAME="google_cloud_ai_platform_your_bucket_name" PROJECT_ID=$(gcloud config list project --format "value(core.project)") BUCKET_NAME=${PROJECT_ID}-mlengine echo $BUCKET_NAME REGION=us-east2 gsutil mb -l $REGION gs://$BUCKET_NAME

[ 252 ]

Building Prediction Applications

Chapter 9

The preceding code specifies your new bucket name. It defines your project ID required for the creation of the bucket. Then, you specify the name of the region in which you want to create your bucket. Finally, you use the gsutil command-line interface (CLI) to create your bucket. The complete reference to the gsutil tool can be found at the following link: https:/​/​cloud.​google.​com/​storage/​docs/​gsutil. The following screenshot provides a guide on how to create buckets with the console UI:

Figure 9.16: GCP console – Storage and Browse

[ 253 ]

Building Prediction Applications

Chapter 9

From the console menu, we need to navigate to the Storage section and click on the Browser section to locate all the data files stored in the bucket. If the user account has access to create multi-region buckets, all the data files across the regional buckets can be seen in one place with this console UI, as shown in the following screenshot:

Figure 9.17: Storage buckets list

The GCP console provides an easy interface to create a new storage bucket. Here is a screenshot of the initial page that opens when we click on the CREATE BUCKET button:

Figure 9.18: Creation of a new bucket

[ 254 ]

Building Prediction Applications

Chapter 9

We need to provide a name for the bucket that is unique across the project. Additionally, we need to provide the region and the default storage class and define the access levels (the user groups and users who can access the bucket) while creating the bucket in the GCP. As the details are provided, the GCP provides an easy interface to view the monthly cost estimate for acquiring and holding data in the bucket that is being created. This helps in selecting appropriate options while creating the bucket, based on the use case and the context of the application. The GCP makes it easy to avoid mistakes that can incur high costs if the configuration is not correctly done.

Uploading saved models to a Google Cloud Storage bucket The next step is that you upload your models to a Google Cloud Storage bucket. If you are using Tensorflow SavedModels, then you can use the following code to upload your models: SAVED_MODEL_DIR=$(ls ./your-export-dir-base | tail -1) gsutil cp -r $SAVED_MODEL_DIR gs://your-bucket

If you export a SavedModel from tf.keras or a TensorFlow estimator, the subdirectory of the export directory you choose is saved as time-stamped as your code base directory/123201202301. This example shows how the directory can be updated with the current timestamp. If you have built your SavedModel in another way, it can be placed elsewhere on your local filesystem. If you have exported your model using scikit-learn or XGBoost, then you can use the following code to export your models in the .joblib , *.pk1, or *.bst formats: gsutil cp ./model.joblib gs://your-bucket/model.joblib gsutil cp ./model.pkl gs://your-bucket/model.pkl gsutil cp ./model.bst gs://your-bucket/model.bst

[ 255 ]

Building Prediction Applications

Chapter 9

If you have a custom prediction routine (beta), you can also upload some additional sample objects to your application directory. Your configuration directory must have a minimum file size of 500 MB or less if you have used a legacy machine (MLS1) type, or 2 GB or less if you are using a Compute Engine (N1) type machine (beta). As you build subsequent iterations of your pattern, store them in a separate directory in the cloud storage bucket. You also have to upload the source delivery package containing your custom code if you deploy a Science Learning pipeline with custom code or a custom prediction routine. You can use the following code for the same: gsutil cp dist/google_cloud_ai_custom_code-1.0.tar.gz gs://ai-mlbucket/google_cloud_ai_custom_code-1.0.tar.gz

Testing machine learning models The next step is to test the machine learning models. The field of machine learning offers tools to automatically take decisions from the data to accomplish some aim or requirement. Some problems withstand a manually specified solution. Machine learning is important because it offers solutions to complex problems. Machine learning promises automatic, quicker, and more precise solutions to problems than a manually specified solution. Machine learning apps are not 100% accurate and will never be partially correct. There are a number of explanations about why testers cannot neglect thinking about computer education and profound learning. The main reason is that these programs learn from the data used to construct algorithms. Because machine learning apps handle human activities almost every day, a mistake can lead to severe losses. In lieu of these facts, machine learning model testing is very important. There are multiples aspects to testing machine learning solutions. One such aspect is to ensure the quality of data being used for training. Your testing of training should ensure that there are no datasets that can skew the results. Quality assurance (QA) of training data creates check protocols to validate whether the data used for training has been sanitized. The checks must also be performed to determine whether data poisoning attacks have occurred accidentally or purposely. Tests pertaining to data statistics (median, average, style, and so on) describe high-level data and relationships. You should create tests using scripts to check statistics and correlations. These tests should be performed at regular intervals. The listed parameters must be tracked at regular intervals and verified before each release by teams.

[ 256 ]

Building Prediction Applications

Chapter 9

You should also perform testing in relation to features used in machine learning model training. Several times, one or more features are redundant/irrelevant and, in effect, affect predictive error rates. QA/testing procedures need to be in place to proactively determine design techniques such as a reduction in dimensionality and the selection of features. Last, but not least, you should also ensure the quality of your model training algorithm. Evolving datasets could lead to increased preview error levels as a result of data poisoning attacks. With the retraining of the machine learning model, the increased prediction error rates mean that machine learning models will be revalued to detect new algorithms with accuracy over existing ones. Retrain all models and track model output at regular intervals with new datasets. Raise a defect if another model is more reliable or more successful than the existing model. You can use the local predict command on the Google Cloud AI Platform to check how your model predicts before you use it in the Google Cloud AI Platform prediction. The command uses local dependence to predict and returns results in the same format as that given by the Google Cloud AI Platform after performing online forecasts. Testing local predictions will allow you to spot errors before costing online prediction requests. For the -model-dir statement, you can define a directory, with your machine learning model exported either on your local machine or in cloud storage. Also, specify tensorflow, sklearn, or xgboost for the --framework argument. You cannot use the local prediction command Google Cloud AI Platform with a custom prediction routine. The following screenshot shows the code used for local model testing: gcloud ai-platform local predict --model-dir local-or-cloud-storage-pathto-model-directory/ \ --json-instances local-path-to-prediction-input.json \ --framework name-of-framework

Deploying models and their version The Google Cloud AI Platform uses model and version resources to organize your trained models. An AI Platform is a container for the models of your learning machine. In the AI Platform, you create a database resource to deploy a model, construct a model version, and then connect the model version to the model file that is stored in cloud storage. You can use the gcloud console to build a default tool for your product versions, filling out your preferred model name without the enclosed brackets, as shown here: gcloud ai-platform models create "[YOUR-MODEL-NAME]"

[ 257 ]

Building Prediction Applications

Chapter 9

You can also use a REST API to create a model resource. The following are the steps for the same: 1. Format the specification by inserting the model object into the request body. You must at least give your model a name. Fill in your name without the brackets for your model, as follows: {"name": "[YOUR-MODEL-NAME]"}

2. Put the following post in your REST API and replace [ VALUES IN BRACKETS ] with the correct values, as follows: POST https://ml.googleapis.com/v1/projects/[YOUR-PROJECT-ID]/models/ curl -X POST -H "Content-Type: application/json" \ -d '{"name": "[YOUR-MODEL-NAME]"}' \ -H "Authorization: Bearer `gcloud auth print-access-token`" \ "https://ml.googleapis.com/v1/projects/[YOUR-PROJECT-ID]/models"

The sample JSON output of the preceding command would be like that shown in the following screenshot:

Figure 9.19: JSON output of a REST API call

Once your model resource is created, you create a model version. We will see how this can be done here. Using the Google Cloud API, you can create versions, as shown in the following screenshot:

Figure 9.20: Newly created model and the model list page

[ 258 ]

Building Prediction Applications

Chapter 9

As shown in the preceding screenshot, choose the name of the model tool to build your version on the Models tab. This takes you to an overview of the product page. At the top of the Model Details page, click the NEW VERSION button. This takes you to the version development page, as shown in the following screenshot:

Figure 9.21: New model version development screen

[ 259 ]

Building Prediction Applications

Chapter 9

One important thing to note while creating model versions is that you can choose scaling options, both manual and auto scaling. The optional minimum number of field nodes is shown if you choose Auto scaling. You can enter the minimum number of nodes to continue running whenever the service has decreased. The default area is 0. You need to enter the number of nodes you want to run at all times if you choose Manual scaling. Refer to the following screenshot:

Figure 9.22: Manual scaling configuration

The Manual Scaling option needs to be carefully selected. The number of nodes that are configured (this is a mandatory field in the case of manual scaling) is provisioned by GCP, irrespective of the load on the cluster. This is done for mission-critical use cases that require model access at a fluctuating level with a high frequency and does not provide the Google Cloud AI Platform to provision required nodes in a reasonable amount of time. In such cases, it is recommended to provide the minimum number of nodes based on the mean traffic level. The provisioned nodes will incur costs even if there are no requests sent for model access, as shown in the following screenshot:

[ 260 ]

Building Prediction Applications

Chapter 9

Figure 9.23: Auto scaling configuration

Another way to achieve the same is by using the gcloud CLI. If you are using gcloud, you start by setting environment variables to save your cloud storage directory path, your model name, your version name, and your choice of framework. You can add underscores to the name of the framework in capital letters (for example, _SCIKIT-LEARN) or a hyphen in lowercase letters (for example, _scikit-learn) when creating a gcloud tool version. All methods lead to the same behavior. The following code block shows how to set the relevant environment variables in the script: MODEL_DIR="gs://your_bucket_name/" VERSION_NAME="[YOUR-VERSION-NAME]" MODEL_NAME="[YOUR-MODEL-NAME]" FRAMEWORK="[YOUR-FRAMEWORK_NAME]"

Set an additional variable on the path to your custom code tarball for a scikit-learn pipeline with custom code such as the following: MODEL_DIR="gs://your_bucket_name/" VERSION_NAME="[YOUR-VERSION-NAME]" MODEL_NAME="[YOUR-MODEL-NAME]" FRAMEWORK="scikit-learn" CUSTOM_CODE_PATH="gs://your_bucket_name/ai_platform_custom_code-0.1.tar.gz"

To do a custom prediction routine, omit the FRAMEWORK variable and define the path to the custom code tarball and the name of your predictor class. Check the following code: MODEL_DIR="gs://your_bucket_name/" VERSION_NAME="[YOUR-VERSION-NAME]" MODEL_NAME="[YOUR-MODEL-NAME]" CUSTOM_CODE_PATH="gs://your_bucket_name/ai_platform_custom_code-0.1.tar.gz" PREDICTOR_CLASS="[MODULE_NAME].[CLASS_NAME]"

[ 261 ]

Building Prediction Applications

Chapter 9

Once you have set your relevant environment variables, you can use the CLI to create the version using the following code: gcloud ai-platform versions create $VERSION_NAME \ --model $MODEL_NAME \ --origin $MODEL_DIR \ --runtime-version=1.15 \ --framework $FRAMEWORK \ --python-version=3.7

Use the gcloud beta component for a scikit-learn pipeline (beta) and make sure you set -package-uris-flag, as follows: gcloud components install beta gcloud beta ai-platform versions create $VERSION_NAME \ --model $MODEL_NAME \ --origin $MODEL_DIR \ --runtime-version=1.15 \ --framework $FRAMEWORK \ --python-version=3.7 --package-uris=$CUSTOM_CODE_PATH

Use the gcloud beta portion for a custom prediction routine, remove the --frame flag, and set the --package-uris and --prediction-class flags, as follows: gcloud components install beta gcloud beta ai-platform versions create $VERSION_NAME \ --model $MODEL_NAME \ --origin $MODEL_DIR \ --runtime-version=1.15 \ --python-version=3.7 --package-uris=$CUSTOM_CODE_PATH --prediction-class=$PREDICTOR_CLASS

You can check the model version, as follows: gcloud ai-platform versions describe $VERSION_NAME \ --model $MODEL_NAME /*****Output*****/ createTime: '2018-02-28T16:30:45Z' deploymentUri: gs://your_bucket_name framework: [YOUR-FRAMEWORK-NAME] machineType: mls1-c1-m2 name: projects/[YOUR-PROJECT-ID]/models/[YOUR-MODEL-NAME]/versions/[YOURVERSION-NAME]

[ 262 ]

Building Prediction Applications

Chapter 9

pythonVersion: '3.7' runtimeVersion: '1.15' state: READY

By example, a model version with the Cloud ML Service Agent Identity and Access Management (IAM) function has permissions from the Google-managed service account. For most situations, this default service account is adequate. However, if you are using a custom prediction routine and need to have a different set of permissions in your model version, you can add another service account for your use. For example, if your model version needs access to a cloud storage bucket from a specific Google Cloud project, a service account with reading authorization from that bucket can be defined. The sample Python code to create a service account is shown in the following code block: import os from google.oauth2 import service_account import googleapiclient.discovery def create_service_account(project_id, name, display_name): """Creates a service account.""" credentials = service_account.Credentials.from_service_account_file( filename=os.environ['GOOGLE_APPLICATION_CREDENTIALS'], scopes=['https://www.googleapis.com/auth/cloud-platform']) service = googleapiclient.discovery.build( 'iam', 'v1', credentials=credentials) my_service_account = service.projects().serviceAccounts().create( name='projects/' + project_id, body={ 'accountId': name, 'serviceAccount': { 'displayName': display_name } }).execute() print('Created service account: ' + my_service_account['email']) return my_service_account

[ 263 ]

Building Prediction Applications

Chapter 9

The user who is deploying the model version should use the service account token creator role for the previously created service account. Specify the service account name of your model version in the serviceAccount area. When using the gcloud method, you can use the --service-account flag, as shown in the following code sample: gcloud components install beta gcloud beta ai-platform versions create your-version-name \ --service-account [email protected]

Model training example In this section, we will look into how to train your models for prediction using the Google Cloud platform. The focus is on how to use the Google Cloud platform to train models and the steps involved in that. The entire training code is taken from Google Cloud sample examples. Please refer to the following link (https:/​/​github.​com/​GoogleCloudPlatform/ cloudml-​samples/​archive/​master.​zip) for the training code. You can also download the data from the Google Cloud public storage (gs://cloud-samples-data/aiplatform/census/data/*): 1. You can create a data directory and use the gsutil copy command to copy data from the Google Cloud bucket to the local directory. This is required as you have to train and test the model locally first. The following code would do the same: mkdir data gsutil -m cp gs://cloud-samples-data/ai-platform/census/data/* data/

The data folder would have the structure shown in the following screenshot:

Figure 9.24: Test and training data files

So, with the assumption that you have your training code, and training data, and all the relevant Python environments for the Google Cloud SDK are set up, we can now look into the steps for training the model using the Google Cloud AI Platform. You first start with local training jobs.

[ 264 ]

Building Prediction Applications

Chapter 9

2. The following code sets up the MODEL_DIR environment variable and then uses the Google Cloud AI Platform command to train models locally, as follows: MODEL_DIR=output rm -rf $MODEL_DIR/* gcloud ai-platform local train \ --module-name trainer.task \ --package-path trainer/ \ --job-dir $MODEL_DIR \ -- \ --train-files $TRAIN_DATA \ --eval-files $EVAL_DATA \ --train-steps 1000 \ --eval-steps 100

The output of the preceding code would look like that shown in the following screenshot:

Figure 9.25: Model training output

[ 265 ]

Building Prediction Applications

Chapter 9

The output directory would have the following contents:

Figure 9.26: Output directory contents

To show the results of the local model training test, you can use the TensorBoard visualization tool. With TensorBoard, you can view your TensorFlow map, compile quantitative data on how your model is run, and show additional data, such as pictures going through the chart. As part of the TensorFlow installation, the TensorBoard is available. 3. Run the following command to start tensorboard: tensorboard --logdir=$MODEL_DIR --port=8080

The output of the preceding command would look like that in the following screenshot:

Figure 9.27: TensorBoard command output

[ 266 ]

Building Prediction Applications

Chapter 9

The following screenshot provides a glimpse of what a TensorBoard looks like:

Figure 9.28: TensorBoard showing the model training graph

Click on the Download PNG option available in the menu on the left side of the screen for a clearer image of the preceding graph.

[ 267 ]

Building Prediction Applications

Chapter 9

As you can see in this screenshot, the model training goes through various stages, and TensorFlow creates the lineage with efficient logging. This lineage is easy to trace with the graphical user interface (GUI). It is possible to trace the input values as they process through the training stages and produce intermediate results. The model can be tuned based on the training graph by passing appropriate values to the runtime:

Figure 9.29: Model stats in TensorBoard

4. Now, after local testing, you need to test the model training in distributed local mode to ensure that the model can be trained in distributed mode, which would be the case if the Google Cloud AI Platform were used to train the model. The following code can be used for the same: MODEL_DIR=output-dist rm -rf $MODEL_DIR/* gcloud ai-platform local train \ --module-name trainer.task \

[ 268 ]

Building Prediction Applications

Chapter 9

--package-path trainer/ \ --job-dir $MODEL_DIR \ --distributed \ -- \ --train-files $TRAIN_DATA \ --eval-files $EVAL_DATA \ --train-steps 1000 \ --eval-steps 100

The output would look like that in the following screenshot:

Figure 9.30: Local training model output

[ 269 ]

Building Prediction Applications

Chapter 9

The output model directory has the following contents. The checkpoint and log folders enable the graphical view on the TensorBoard, as shown in the following screenshot:

Figure 9.31: Model training output directory

5. The preceding steps are about running model training jobs for local testing. After this, for actual production-grade deployment, you need to run model training on the cloud. For that, you have to start by creating buckets (or you can use existing ones as well). The following code will create buckets for you: BUCKET_NAME_PREFIX="ai_ml_book" PROJECT_ID=$(gcloud config list project --format "value(core.project)") BUCKET_NAME=${PROJECT_ID}-${BUCKET_NAME_PREFIX} echo $BUCKET_NAME REGION=us-central1 gsutil mb -l $REGION gs://$BUCKET_NAME

The output of the preceding code would look like the following:

Figure 9.32: Bucket creation output

6. Once the bucket is created, you upload the artifacts to the bucket using the following code. This code also sets some variables to correct values so that the next set of commands can run: gsutil cp -r data gs://$BUCKET_NAME/data TRAIN_DATA=gs://$BUCKET_NAME/data/adult.data.csv EVAL_DATA=gs://$BUCKET_NAME/data/adult.test.csv gsutil cp ../test.json gs://$BUCKET_NAME/data/test.json TEST_JSON=gs://$BUCKET_NAME/data/test.json

[ 270 ]

Building Prediction Applications

Chapter 9

The output would look like the following:

Figure 9.33: Bucket upload output

7. You are now ready to work in the cloud with an approved exercise in both single and distributed mode. You will begin by applying for a one-instance training job. Use the default BASIC scale compute tier to perform a training job with a single instance. The initial job request can take a couple of minutes to start, but jobs will run faster afterward. It helps you to easily iterate when improving and validating your preparation. The following is the code for the same: JOB_NAME=ai_book_model_single_1 OUTPUT_PATH=gs://$BUCKET_NAME/$JOB_NAME gcloud ai-platform jobs submit training $JOB_NAME \ --job-dir $OUTPUT_PATH \ --runtime-version 1.14 \ --module-name trainer.task \ --package-path trainer/ \ --region $REGION \ -- \ --train-files $TRAIN_DATA \ --eval-files $EVAL_DATA \ --train-steps 1000 \ --eval-steps 100 \ --verbosity DEBUG

[ 271 ]

Building Prediction Applications

Chapter 9

The output of the preceding command would look like the following:

Figure 9.34: Model training output

8. Upon running the gcloud ai-platform jobs describe ai_book_model_single_1 command, the following output would be seen:

Figure 9.35: Model training output (2)

[ 272 ]

Building Prediction Applications

Chapter 9

You can view the job status and logs from the Google Cloud console user interface as well. The following screenshot represents the same:

Figure 9.36: Model training status on the Google Cloud console

9. While the model training is in progress, we see a progress bar in the model list user interface. The logs can be accessed in real time for understanding the model training progress as well as understanding the intermediate results. This level of logging is useful in model tuning, and can be seen in the following screenshot:

Figure 9.37: Model training logs

[ 273 ]

Building Prediction Applications

Chapter 9

The output stored on the Google Cloud bucket would look like our locally trained output model only, as shown in the following screenshot:

Figure 9.38: Output on the Google Cloud bucket

10. By starting TensorBoard, the behavior of your training job can be inspected and point to summary logs produced during the workout—both during and after the workout. Since the summaries are written by the training programs to a cloud storage location, TensorBoard can read from them without having to copy event files manually. The following screenshot is for the same:

[ 274 ]

Building Prediction Applications

Chapter 9

Figure 9.39: Model training graph on TensorBoard

Click on the Download PNG option available in the menu on the left side of the screen for a clearer image of the preceding graph.

11. You should configure your training job in distributed mode to take advantage of Google's flexible platform when carrying out training jobs. To run this model as a distributed process on the AI Platform, no code changes are required. Set -scale-tier to any tier above fundamental to perform a distributed job. The following is the example code for the same: JOB_NAME=census_dist_1 OUTPUT_PATH=gs://$BUCKET_NAME/$JOB_NAME gcloud ai-platform jobs submit training $JOB_NAME \ --job-dir $OUTPUT_PATH \ --runtime-version 1.14 \ --module-name trainer.task \ --package-path trainer/ \ --region $REGION \ --scale-tier STANDARD_1 \ -- \ --train-files $TRAIN_DATA \ --eval-files $EVAL_DATA \ --train-steps 1000 \ --verbosity DEBUG

[ 275 ]

Building Prediction Applications

Chapter 9

The output for the preceding code is shown in the following screenshot:

Figure 9.40: Training job output with a scale

12. You can stream logs using the gcloud ai-platform jobs stream-logs $JOB_NAME command. The following screenshot represents the output of the stream logs:

Figure 9.41: Streaming a job

13. The final step of model training is to deploy the models. The following code will deploy the model with a version: MODEL_NAME=census gcloud ai-platform models create $MODEL_NAME --regions=$REGION OUTPUT_PATH=gs://$BUCKET_NAME/census_dist_1 gsutil ls -r $OUTPUT_PATH/export MODEL_BINARIES=gs://$BUCKET_NAME/census_dist_1/export/census/157846 6652 gcloud ai-platform versions create v1 \ --model $MODEL_NAME \ --origin $MODEL_BINARIES \ --runtime-version 1.14

[ 276 ]

Building Prediction Applications

Chapter 9

The following is the output of the preceding command:

Figure 9.42: Model deployment output

You can also view the deployed model version on the Google Cloud console user interface, shown in the following screenshot:

Figure 9.43: Deployed model view from the Google Cloud console

In the next section, we will take a look at how we can use the deployed census model for both online and batch prediction.

[ 277 ]

Building Prediction Applications

Chapter 9

Performing prediction with service endpoints This step generally comes after you have trained and deployed your machine learning model on the Google Cloud API platform along with the version. The following diagram represents how online prediction works in the Google Cloud AI Platform:

Figure 9.44: Online prediction workflow

You can use models deployed on Google Cloud Storage buckets to be exposed as service endpoints, and those service endpoints can be used by some applications using the REST API or by using the gcloud tool itself. The gcloud utility is used to test the online prediction quickly. In an actual production case, you either use scripts written in Python or cURL utilities to use the Rest API, exposing prediction functionality on the deployed model version.

[ 278 ]

Building Prediction Applications

Chapter 9

The following code shows how to use the gcloud tool for online prediction. Example JSON code would look like that shown here: {"values": [1, 2, 3, 4], "key": 1} {"values": [5, 6, 7, 8], "key": 2}

The code would look like this: MODEL_NAME="census" INPUT_DATA_FILE="instances.json" VERSION_NAME="v1" gcloud ai-platform predict --model $MODEL_NAME \ --version $VERSION_NAME \ --json-instances $INPUT_DATA_FILE

With this code, we are able to use the deployed model and predict the outcome based on the training data.

Summary In this chapter, we have covered a fundamental aspect of AI that enables us to predict the outcome based on historical data. We understood the generic process of predictive analytics and took a deep dive into the process, with an example on GCP. The model building (training, evaluating) and deployment process on GCP is facilitated with command-line as well as user interface tools on the Google Cloud console. We also took a look at how to version the models and utilize the appropriate model for predictions. The prediction service is also exposed as an API call that can be called in a language-agnostic manner. In the last chapter of this book, we will utilize all the components studied so far to build an AI application.

[ 279 ]

4 Section 4: Building Applications and Upcoming Features This section summarizes all the knowledge acquired from previous chapters. It comprises a single chapter. We will build an end-to-end AI application using various components on Google Cloud Platform (GCP). This chapter provides the general process of quickly building production-ready applications with GCP. This section comprises the following chapter: Chapter 10, Building an AI Application

10 Building an AI application We have studied the fundamental building blocks of artificial intelligence in the context of Google Cloud Platform (GCP) in the last nine chapters of this book. In this chapter, we will recap the various tools and techniques that we have learned in this book and apply them to build a sample AI application. We will also take a look at the upcoming features of GCP that will further enhance its ability to build intelligent applications by democratizing AI and ML. The following topics will be covered in this chapter: A step-by-step approach to developing AI applications Overview of a use case – automated invoice processing (AIP) Upcoming features

A step-by-step approach to developing AI applications A platform facilitates real-world applications and use cases that solve problems by complementing human intelligence. As artificial intelligence becomes mainstream and is adopted by day-to-day applications and software platforms, it is imperative that the developer community, as well as large organizations, leverage platforms such as GCP to build powerful applications with minimum time and human resource investment. The AI of GCP also makes it easy for us to carry out various experiments and test a hypothesis before creating a production application. Data management and ML is provided as a service on GCP. This eliminates the need for managing infrastructure and software. The development team can focus on solving the real problem and does not need to worry about the platform and infrastructure management. To facilitate this, the sophisticated toolkit offered by GCP is available to the developer community without needing an upfront investment. We are going to see a major shift in the application turnaround time and ease of development.

Building an AI application

Chapter 10

This facilitates cutting-edge innovation and solves some of the real-world problems whose solutions were once thought impossible. The development of AI applications, in general, follows a step-by-step approach. At a high level, there are six stages of AI application development and operationalization, as shown in the following diagram. In this section, we will take a look at these generic stages and learn about fundamental ML theory, and in the subsequent sections, we will learn with an example:

Figure 10.1: AI application development stages

Problem classification The problems that can be solved by AI can be broadly classified into the following categories based on the characteristics of inputs and expected outcomes.

[ 282 ]

Building an AI application

Chapter 10

Classification A classification problem typically involves the classification of an outcome into predefined categories. The outcome is called a dependent variable(s). The outcome is dependent on a set of input variables or features. The input variables or features are called independent variables. Classification problems train the model to mathematically map the combination of independent variables to the dependent variable(s). The output is one of the values within the set. For example, a fruit image, when passed through the classification algorithm, is classified as an apple or an orange. Typically, the algorithms prescribe the probability of the image belonging to a particular class. The class with the maximum probability is the classification based on the training data.

Regression Unlike classification problems, regression problems expect a continuous outcome instead of a discrete class. For example, a model that predicts rainfall for a particular demographic area can predict the rainfall in millimeters based on various input parameters.

Clustering Clustering problems are typically addressed by unsupervised learning methods. The variable space is segregated into a number of clusters based on the similarity index. This is mathematically modeled based on the distance of a data point from the centroid of the group inputs. For example, biological cells can be clustered into the infected or normal categories by the clustering algorithm.

Optimization Optimization problems are used to improve the performance of a particular computational activity. For example, if we need to compress a file, various algorithms can be tried and the best algorithm will be selected based on the type of data. Another example of an optimization problem is finding the best route for a vehicle based on real-time traffic and other road conditions. The algorithm can assist in optimizing the route for the shortest travel time from point A to point B.

[ 283 ]

Building an AI application

Chapter 10

Anomaly detection Anomaly detection is a type of problem where extreme conditions within the data need to be detected. A very prominent use case of anomaly detection is cybersecurity. Massive volumes of network logs are generated, and it is impossible to detect the outliers without smart sampling and an intelligent algorithm.

Ranking Ranking problems involve us needing to sort the outcome space by stacked ranking based on the specific use case. The rules that govern the ranking act as the input variables and the sequential ranking is the output. These problems can be solved with supervised as well as unsupervised learning methods. A typical use of these ranking algorithms is through recommendation systems. In the case of retail, products are ranked by relevance based on various user preferences and presented to the user.

Data preparation Data preparation problems are where the algorithm is expected to generate data from a historical pattern. An example use case is the creation of human faces based on neural networks and a large volume of facial images that are fed to the algorithm (using a generative adversarial network) to generate new images. The algorithms can create realistic data for testing and experimentation purposes.

Data acquisition An AI application gets better the more data it gets for training the underlying models. The quality and quantity of data is of the utmost importance for the AI application to really complement human intelligence. This means that data acquisition from various sources is an important step in the process. The data is typically acquired in a batch or near-real-time manner. Typically, historical data is used in batch mode for training and tuning the models and realtime data is used in streaming mode for further processing. In certain use cases, there is an expectation of minimum time lag between the event time and the action time for the AI application. In such cases, we need to consider using a high-end computation infrastructure that ensures submillisecond latency in data acquisition and processing.

[ 284 ]

Building an AI application

Chapter 10

Data processing We are generating more data as more devices connect to the internet. This is called big data, and results in data volume growth. It is imperative that we use as much data as possible to build the AI applications since data in all shapes contains intelligence that can be leveraged to build intelligent applications. With the ability to process unstructured data, big data now includes, for example, images, videos, audio files, text, and wireless data. Big data processing also involves computer vision, natural language processing (NLP), social networking, speech recognition, internet of vehicles (IoV) data analysis, real-time internet of things (IoT) data analysis, and wireless big data processing. Recently, large-scale heterogeneous data-processing technologies powered by artificial intelligence (AI) focused on pattern recognition, ML, and deep learning have been intensively implemented. Nevertheless, the development of AI-driven big data processing remains challenging. More and more databases and data streams have been distributed and processed in computer vision and image processing. One of the biggest challenges in the massive analysis of image/video data is to build energy-efficient and real-time methods to extract useful information from the huge amount of data produced every second. A lot of progress has also been made in speech processing, benefiting from the help of big data and new AI technology.

Problem modeling We need to understand the specific problem and use an appropriate model to solve that specific problem. Based on the specific category of the problem, we need to experiment with various algorithms before making the decision to use a particular model in production. In a specific category of algorithms, it is possible to deploy an incorrect model that does not produce accurate results on the new dataset. We need to understand the data pattern over a period of time before finally choosing the model that is deployed. This is especially important in the case of mission-critical AI applications that augment human capabilities.

[ 285 ]

Building an AI application

Chapter 10

Validation and execution The trained model needs to be thoroughly validated before using it for AI applications. With AI applications, the idea is to complement human intelligence and so it becomes even more important to ensure that the model is validated on heterogeneous data samples (evaluation datasets) before deploying it. The model needs to pass a high threshold not only on the predefined evaluation sample but on the new datasets that were never seen by the model. We can evaluate the model using two primary categories of validation: holdout and cross-validation. Both approaches use a test set to assess the model output (that is, data that is not seen by the algorithm). Using the data that we used to develop the model to test it is not recommended. This is because our model will remember the entire training set, and thus will predict the correct label at any point in the training set. This is known as overfitting. Because of its speed, simplicity, and versatility, the holdout approach is useful. Nonetheless, this approach is often associated with high uncertainty, as variations in the training and test dataset can lead to significant differences in accuracy estimates.

Holdout The fundamental idea of the holdout approach is to avoid overfitting by exposing the model to the new dataset compared to the one that was used for training. Initially, it results in a model accuracy that is below the minimum threshold; however, the method provides an unbiased learning performance and accurate estimate. The dataset is randomly broken down into three subsets in this method: Training set: This is a dataset subset used for predictive model building. Validation set: This is a dataset subset used to evaluate the performance of the model built in the training phase. It provides a test platform to fine-tune the parameters of a model and select the best performing model. Not every modeling algorithm requires a set of validations. Test set: This is a subset of a dataset used to predict the potential future output of a product.

[ 286 ]

Building an AI application

Chapter 10

Cross-validation This is a technique where we have access to a limited dataset for training and evaluation. With this technique, we break down the data into various chunks. These independent chunks are used in random order for training and evaluation through multiple iterations over the course of the evaluation. This ensures that the model is trained on a variety of samples and that the equivalent evaluation is also performed by a different subset of the training data. This ensures that there is enough variation in the model structure to avoid underfitting as well as overfitting; however, it is recommended that you have as much data as possible for training and evaluation, even with cross-validation methods. The calculation of accuracy with these methods is an average of the various iterations that were used through training and evaluation.

Model evaluation parameters (metrics) Model evaluation metrics are needed to measure model efficiency. Choosing assessment metrics depends on a particular task of ML (such as classification, regression, ranking, clustering, or subject modeling, among others). Certain metrics, such as precision recall, are useful for multiple tasks. Supervised- and experience (historical data)-based ML models, such as classification and regression, form the majority of applications for ML.

Classification metrics Let's look at some of the commonly used classification metrics: Classification accuracy: For classification problems, accuracy is a standard measurement metric. It is typically measured as a ratio of the total number of predictions to the number of correct predictions. For example, if we have a use case for the classification of animals based on images, we measure the number of correct classifications by comparing the training and evaluation time data and taking a ratio with the total number of classification attempts. We need to run multiple tests by crossing the sample data into various evaluation tests in order to avoid over and underfitting the data. We can also deploy cross-validation to optimize the performance by comparing various models instead of running multiple tests with different random samples on the same model. With these methods, we can improve classification accuracy in an iterative manner.

[ 287 ]

Building an AI application

Chapter 10

Confusion matrix: Information about incorrect classifications is also important in improving the overall reliability of the model. A confusion matrix is a useful technique in understanding the overall model efficiency. Logarithmic loss: This is an important metric for understanding the model performance when the input is a probability value between 0 and 1. It is preferable to have a minimum log loss for the ML model to be useful. The threshold typically set for this metric is less than 0.1. Area under curve (AUC): AUC offers an aggregate performance metric across all possible rating levels.

Model deployment AI and ML model deployment are important steps in fully operationalizing the model. GCP makes it easy and seamless to deploy multiple versions of a model. Here are the generic steps we need to follow to deploy a model in production: 1. Make the model file available on the Cloud Storage bucket. 2. It is recommended that you use a dedicated Cloud Storage pool in the same AI system project you are using. If you need to use a bucket in a different project, you must ensure that your AI platform account is able to access your Cloud Storage model. Your attempt to build an AI platform version of the model will fail without the required permissions. Here are the steps involved in setting up the Google Cloud bucket: 1. Set a unique name for the bucket. Ensure that it has a unique name to distinguish it from all the other buckets in your project's Cloud Storage. Typically, you can generate the bucket name by appending -mlengine at the end of the project ID. This is a recommended best practice in order to ensure consistent deployment across your projects. For example, if the project name is AIGCP, the bucket name for storing the model files will be AIGCP-mlengine. 2. Set the region for the bucket and set the environment variable value. It is recommended that you use the same region on which the AI platform jobs are intended to be run.

[ 288 ]

Building an AI application

Chapter 10

3. Upload the model to the newly created bucket by using the following command for the TensorFlow SavedModel: SAVED_MODEL_DIR=$(ls ./your-export-dir-base | tail -1) gsutil cp -r $SAVED_MODEL_DIR gs://AIGCP-mlengine 3. Test the model—Gcloud ai-platform local predict command to check how your model can handle predictions before you use it for AI platform prediction. The command uses dependencies in your local environment to forecast and produces an output in the same way as gcloud ai-platform predict when it conducts predictions online. It is recommended that you check the predictions locally in order to spot errors before sending the online prediction requests in order to save costs. Here is the sample command for testing the model: gcloud ai-platform local predict --model-dir model_storage_path/ \ --JSON-instances path-for-prediction-input.JSON \ --framework tensorflow

In this section, we have seen the stages of building an AI application. Here is a simplified view of the high-level building blocks for developing an AI application:

Figure 10.1: AI application high-level building blocks

[ 289 ]

Building an AI application

Chapter 10

As we can see in the diagram, there are primarily three areas in an AI application pipeline: Data acquisition, Data Processing (Feature Engineering), followed by Model Execution and Tuning. In the subsequent sections of this chapter, we will consider a use case and see how easy it is to build a working application with an AI toolkit on GCP.

Overview of the use case – automated invoice processing (AIP) Invoice processing is a repetitive and common process that follows a typical workflow. At this time, most organizations require a great deal of manual intervention to manage various aspects of invoice processing. Various GCP tools that we have seen in this book can automate the entire invoice processing system for an organization. In this section, we will explore a use case by going through the process of automated invoicing step by step. As an example, let's consider a software development firm that provides IT consulting services to its customers. Typically, for the project's time and material costs, an invoice is sent to the client at the end of the month (or predefined invoice period). The invoice contains all the details about the services provided by the vendor to the client with details of the individual services, as well as the details required for payment processing (bank account details, for example). In our example, let's consider that the invoice file is in PDF format and it is received via email. Here is a sample invoice for our example:

Figure 10.2: Sample invoice for the example use case

[ 290 ]

Building an AI application

Chapter 10

Generally, the invoice contains the following sections: 1. The client name, address, and phone number. 2. Invoice header information, such as the invoice number, date, SOW#, project ID, and so on. 3. Invoice amount breakdown and total. 4. The invoice amount in words. 5. Bank account details for remittance. Once the invoice is received by the accounts payable team, they validate the individual billable entries against the time-sheet system and schedule the invoice for payment as per the vendor's payment term. This workflow is depicted in the following flowchart:

Figure 10.3: Invoice process flow with representative GCP components

We are going to use the GCP components to automate this process. In the next section, we will take a look at the design for the AIP system on GCP.

[ 291 ]

Building an AI application

Chapter 10

Designing AIP with AI platform tools on GCP We will design the AIP application with some of the GCP components that we have already studied earlier in this book. The following diagram highlights four sections that individually contribute to building the application. We will use the raw invoice PDF files as the source of information. Data will be stored in Cloud SQL and passed to the AI toolkit with the help of Cloud Functions for further processing. We will then leverage the AI toolkit on GCP to build intelligence within the application. Primarily, we require a ML, natural language interface, the vision API, and the speech API to enable the conversational interface. In the end, we will leverage outbound services, specifically Dialogflow. Here is a high-level component interaction for the AIP application using GCP components:

Figure 10.4: Component design for the AIP application

[ 292 ]

Building an AI application

Chapter 10

By integrating various GCP components, we can realize a fully functional AIP application. The following list shows a summary of the capabilities of the various GCP components that are involved in the application. The following components can be used as datastores in GCP; in our application, we will utilize Cloud SQL: Cloud SQL: This is a fully managed database that can serve the purpose of a relational database on GCP. At the time of writing, Cloud SQL can be used with MySQL, PostgreSQL, and SQL Server. Cloud Bigtable: This is a NoSQL database service on GCP. Bigtable provides a low-latency, massively scalable interface to the datasets. This storage is ideal for ML applications and it is easy to integrate with various open source, big data frameworks. Cloud Spanner: This is a horizontally scalable, relational database service that is highly consistent across various instances and nodes. It combines the best features from relational and nonrelational databases and its fully managed nature makes it an attractive option for massively large database deployments with minimum time investment. Cloud Memorystore: This is a fully managed, in-memory datastore service that can be used for building application caches, and provides extremely low latency for data access. This memory store leverages the Redis protocol, which makes it consistent across deployments. The porting of the cache is easy. Cloud Firestore: This is a handy NoSQL datastore that is used to keep the data in sync between the client and server side. It is typically used for mobile, web, and server development, and offers support for building responsive applications and cross-device applications. Cloud Datastore: This is another fully managed NoSQL database that handles sharding and replication, which ensure high availability across deployment zones. It supports ACID transactions and SQL-like queries and indexes. The datastore provides a REST API that makes it easy to integrate external applications by allowing data access over a multilevel secure connection. It is easy to make changes to the underlying data structure and provides an easy-touse query language.

[ 293 ]

Building an AI application

Chapter 10

The following components can be used for computing and processing; in our application, we will utilize Cloud Functions: The Cloud SDK: This is a set of tools and libraries for developing with GCP. This is a command-line interface to all the GCP services, such as virtual machine orchestration, compute engines, networks, and disk storage. It is typically used for automating various application management tasks. In the case of the AIP application, we can use the Cloud SDK to manage development, test and deployment pipelines, and automate the management of the entire application. Cloud Functions: This is a serverless, event-driven computation platform. It is easy to scale up and down, depending on the workload and based on predefined events. In the case of the AIP application, we can use Cloud Functions to trigger the automated process of reading the invoice and processing it once the invoice lands in the source location. It provides seamless support for interoperability across various languages and application interface protocols. It works on the core security principle of least privilege and facilitates secure access to events and functions based on the allocated role to the user. Here are the components from the AI toolkit that can be leveraged; in our application, we will utilize Cloud Machine Learning, the Natural Language API, the Vision API, the Translation API, the Speech API and Cloud Video Intelligence API: Cloud Machine Learning: This is a comprehensive platform that leverages the capabilities of Google Cloud and makes it easy to train and deploy ML models that can work on any type and size of data. The models trained on the platform are immediately available for use and can be accessed via a secure API that makes it easy to develop quick prototypes of the applications with minimum capital and time investment. The Natural Language API: The Natural Language API is useful in performing text analytics at web scale. The API integrates with AutoML and allows users to train, test, and deploy their models based on unstructured data. There are pretrained models that can be easily leveraged by various applications with various functional domains. The platform makes it easy to collaborate and share the models that are constantly upgraded by the platform, and thereby provide a consistent and reliable means of creating an application that deals with the inputs in natural language form. This enables human-like interaction with data via the Speech API.

[ 294 ]

Building an AI application

Chapter 10

The Vision API: This is handy in performing analytics based on visual inputs, such as images. There is a large number of use cases that we can apply this API to if the machine has the capability of complementing human vision, such as transcribing text from a PDF or Word document. With this, a large number of documents can be classified in an efficient and cost-effective manner. The service is available for consumption without the need for extensive setup, and so makes it easy and seamless for you to start leveraging it once the service is enabled for the project and the user account. The powerful API layer makes it easy to integrate with third-party applications in a secure manner. The Translation API: This is a handy service that can be used without the need for setting up the translation rules. The API allows translation between various languages based on pretrained and constantly evolving models. The capabilities are easy to use, opening up possibilities for language interoperability and integration among various applications that are built in various languages. The Speech API: This is an important aspect of an intelligent machine that enables a human-like interface with applications. Primarily, the API enables textto-speech-to-text conversion. These features can be used for creating conversational applications. DialogFlow internally leverages the Speech API to facilitate conversations. The Cloud Video Intelligence API: This API enables us to leverage information within video inputs. There are pretrained models available on GCP that can be leveraged for the classification and identification of specific objects within the video frames. It is one of the building blocks of autonomous vehicles, and the API is efficient and scalable enough to use in mission-critical applications. With an overview of the various components available on GCP for building end-to-end AI applications, in the next sections, let's take a look at how to build the automated invoice processing application on GCP.

[ 295 ]

Building an AI application

Chapter 10

Performing optical character recognition using the Vision API The first step after receiving the invoice in PDF format is to interpret its contents. We are going to use the Vision API to perform the optical character recognition (OCR) by going through the following steps: 1. Create a new project on GCP. It is recommended that you use a new project to experiment with OCR. Once the documents are transcribed, the project can be deleted if desired without affecting any other application. 2. Enable the Cloud Vision API, Cloud Pub/Sub, Cloud Functions, and Cloud Storage from the GCP console. 3. From Cloud Shell (from the GCP console), ensure that all the components are updated with the following command: gcloud components update

4. Create two buckets on Cloud Storage as follows:

Figure 10.5: gsutil commands for creating buckets

5. The aigcp bucket is created to store the invoice PDF files and the aigcp-text bucket is used to store the transcribed text file (JSON formatted). 6. Copy the invoice PDF file to the aigcp bucket (either from Cloud Shell or from the GCP console). Here is a quick snapshot of the bucket contents from the GCP console:

[ 296 ]

Building an AI application

Chapter 10

Figure 10.6: Invoice file in the Google Cloud Storage bucket

7. Run the following command to read the invoice PDF file with the Vision API and transcribe it into the text:

Figure 10.7: Command for OCR and text detection using the Vision API

8. The OCR can also be performed via any one of the programming interfaces. GCP provides APIs for C#, Go, Java, Node.js, PHP, Python, and Ruby. Let's take a look at a small snippet of the Python implementation of the OCR. The complete implementation is available on GitHub at https:/​/​github.​com/ PacktPublishing/​Hands-​On-​Artificial-​Intelligence-​on-​Google-​CloudPlatform: def async_detect_document(source_path_gcs, destination_path_gcs): client = vision.ImageAnnotatorClient() feature = vision.types.Feature( type=vision.enums.Feature.Type.DOCUMENT_TEXT_DETECTION) gcs_source = vision.types.GcsSource(uri=source_path_gcs) input_config = vision.types.InputConfig( gcs_source=gcs_source, mime_type=mime_type) async_request = vision.types.AsyncAnnotateFileRequest( features=[feature], input_config=input_config, output_config=output_config) operation = client.async_batch_annotate_files( requests=[async_request])

[ 297 ]

Building an AI application

Chapter 10

9. The transcribed text file in JSON format is now stored in the aigcp-text bucket, as shown in the following screenshot:

Figure 10.8: Output JSON file in Google Cloud Storage

10. In order to extract the meaningful information from the output JSON file, we need to understand the format of the output JSON file (output-1-to-1.JSON):

Figure 10.9: JSON header in the output file

[ 298 ]

Building an AI application

Chapter 10

This is the header information within the output file. The Vision API automatically detects the language used within the PDF document. The actual character data is encapsulated within the hierarchy of blocks > paragraphs > words > symbols, as shown in the following diagram:

Figure 10.10: Output JSON structure

[ 299 ]

Building an AI application

Chapter 10

The Vision API also collects all the symbols found within the PDF file into a text field. This helps us quickly gather the meaningful content from the invoice. Here is the snippet from the output JSON that contains the text field:

Figure 10.11: Text from the invoice file

Refer to Figure 10.2 for the original content within the invoice and the text extracted by the Vision API. All the fields within the invoice are available in the output JSON file within the text field. At this point, we have utilized the Vision API to extract meaningful information from the invoice for it to be processed automatically. In the subsequent sections, we will store the relevant data, perform validation, and schedule the invoice for payment by using various GCP tools.

Storing the invoice with Cloud SQL In this section, we will look at a walk-through of the complete process for picking text from a JSON file generated from the OCR output and pushing it in to Cloud SQL through Cloud Functions. First, we will go through the steps for making Cloud SQL and Cloud Functions ready on GCP and then we will go into an explanation of the code that is used for picking text and converting it into a required structured format to insert it into the table.

Creating a Cloud SQL instance First, let's create a Cloud SQL instance by going through the following steps: 1. From the left panel, select SQL from the Storage section:

[ 300 ]

Building an AI application

Chapter 10

2. Click on CREATE INSTANCE:

3. Click on Choose MySQL:

[ 301 ]

Building an AI application

Chapter 10

4. Give the instance a name, set an appropriate password, select your region, keep the default database version, and click on Create. Remember the password that you set for the root user, as it will be useful later:

Setting up the database and tables Now let's set up the database and tables with the help of the following steps: 1. Click on the created instance:

[ 302 ]

Building an AI application

Chapter 10

2. Click on Connect using Cloud Shell and connect to MySQL by entering the password for the root user that was set when we created the instance:

3. Run the following command to create the database: create database aip_db;

4. Run the following command to create the invoice table. The SQL file aip.sql is also available at https:/​/​github.​com/​vss-​vikram/​Hands-​On-​ArtificialIntelligence-​on-​Google-​Cloud-​Platform to create database-related artifacts: create table invoice (Company_Name VARCHAR(50), Client_Name VARCHAR(50), Client_Address VARCHAR(50), SOW_Number VARCHAR(50), Project_ID VARCHAR(50), Invoice_Number VARCHAR(50), Invoice_Date DATE, Billing_Period DATE, Developer VARCHAR(255), Rate VARCHAR(50), Hours INT, Subtotal VARCHAR(50), Balance_Due VARCHAR(50), Bank_Account_Number INT, Bank_Name VARCHAR(50));

[ 303 ]

Building an AI application

Chapter 10

Enabling the Cloud SQL API Now let's enable the Cloud SQL API: 1. Search for the Cloud SQL Admin API from the console:

2. Click on ENABLE:

Enabling the Cloud Functions API Now let's enable the Cloud Functions API: 1. Search for the Cloud Functions API from the console:

2. To enable the API, click on ENABLE:

[ 304 ]

Building an AI application

Chapter 10

3. The following Cloud SQL details are needed to establish a connection from Cloud Functions to Cloud SQL: Instance name Database name User name – root Password of the root user that we set up when we created the instance

Creating a Cloud Function Now let's create a Cloud Function: 1. Select Cloud Functions from the left panel under the COMPUTE section:

[ 305 ]

Building an AI application

Chapter 10

2. Click on CREATE FUNCTION:

3. Enter the desired function name. The allocated memory can be 256 MB. Select the Trigger as Cloud Storage and the Event Type as Finalize/Create. The bucket name should be the name of the bucket where the PDF-to-text output is stored. The Source code should be Inline editor:

[ 306 ]

Building an AI application

Chapter 10

4. Copy the code from aip_cloud_function.py and replace the required details based on the bucket name and the pub/sub topic that you have created. One more replacement will be needed to connect to the Cloud SQL database. Replace the line of code mentioned in step 6 with your Cloud SQL details in the downloaded code. 5. Download the Cloud Function code from https:/​/​github.​com/​vss-​vikram/ Hands-​On-​Artificial-​Intelligence-​on-​Google-​Cloud-​Platform. 6. Provide MySQL connection details as shown in the following code. You will get the same line of code in the Cloud Function code that you downloaded from the preceding step. Please replace the database user, password, name, and Cloud SQL instance with the appropriate details: # MYSQL+pymysql://:@/?unix_socket=/clo udsql/

7. Once you have replaced the line, copy and paste the code, as shown in the following screenshot:

[ 307 ]

Building an AI application

Chapter 10

8. For the dependency setup, copy the contents from requirement.txt from the following link and paste them in the requirements file at https:/​/​github.​com/ vss-​vikram/​Hands-​On-​Artificial-​Intelligence-​on-​Google-​Cloud-​Platform

9. Click on Create Function. 10. Once the function is created, click on the function, and go to the General tab. Check the service account that is used to run the function:

[ 308 ]

Building an AI application

Chapter 10

Providing the Cloud SQL Admin role Next, let's provide the Cloud SQL Admin role to the service account from which the Cloud Function will run: 1. Go to the IAM section:

2. Select the service account that you will use to run the Cloud Function:

[ 309 ]

Building an AI application

Chapter 10

3. Assign the Cloud Admin role to the account:

Now let's walk through the complete code of the Cloud Function for processing the invoice after the PDF is converted into text. The following is a partial code snippet. The full code is available at https:/​/​shorturl.​at/​yADV9: def validate_aip(event, context): """Triggered by a change to a Cloud Storage bucket. Args: event (dict): Event payload. context (google.cloud.functions.Context): Metadata for the event. """ #Code block to read file from GCS and load data in JSON format.

[ 310 ]

Building an AI application

Chapter 10

client = storage.Client() bucket = client.get_bucket(bucket) blob = bucket.get_blob(file_path) contents = blob.download_as_string() contents = contents.decode("utf-8") data = JSON.loads(contents) #Reading required text field from data. output = data['responses'][0]['fullTextAnnotation']['text'] output = output[:-1] #Code setup to convert output data from JSON file to the required format for loading into invoice table.

df_invoice = pd.DataFrame.from_dict(output_dict) df_invoice[['Developer', 'Rate', 'Hours', 'Subtotal']] = df_invoice['Developer Rate Hours Subtotal'].str.split(expand=True) df_invoice = df_invoice.drop(columns=['Developer Rate Hours Subtotal']) #Establishing connection with Cloud SQL. db_con = sqlalchemy.create_engine('MYSQL+pymysql://: @/? unix_socket=/cloudsql/' )

In this section of the code, we imported the required dependency, then, as we loaded the data from GCS, we used GCS-specific libraries to load data from the files stored in GCS. Once the file was loaded, specific text that contained all the information that we discussed earlier in the chapter was extracted from the JSON file. Regex was applied to segregate the text fields into individual columns and the Regex output was used to populate the invoice table in Cloud SQL.

Validating the invoice with Cloud Functions In this section, we will go through the code that we use to validate the processing of the invoice. You will need to load the data in the timesheet table to cross-verify with the invoice table (the invoice table was populated using the PDF file).

[ 311 ]

Building an AI application

Chapter 10

The following are the steps to create a timesheet table in the database and load data into the table: 1. Use the Cloud Shell instance to connect to MySQL, as we discussed in step 2 of the Storing the invoice with Cloud SQL recipe and run the following queries to create the table timesheet table: create table timesheet (Company_Name VARCHAR(50), SOW_Number VARCHAR(50), Project_ID VARCHAR(50), Invoice_Number VARCHAR(50), Invoice_Date DATE, Billing_Period DATE, Developer VARCHAR(255), Rate VARCHAR(50), Hours INT, Bank_Account_Number INT, Bank_Name VARCHAR(50));

2. Run the following queries to insert data into the timesheet table. As mentioned earlier, you can use aip.sql to get these queries: insert into timesheet (Company_Name, SOW_Number, Project_ID, Invoice_Number, Invoice_Date, Billing_Period, Developer, Rate, Hours, Bank_Account_Number, Bank_name) values ('Vsquare Systems', '001', '002', '030', '2020-01-31', '2020-01-31', 'Developer1', '$185', 160, 000000001, 'Payment Bank'); insert into timesheet (Company_Name, SOW_Number, Project_ID, Invoice_Number, Invoice_Date, Billing_Period, Developer, Rate, Hours, Bank_Account_Number, Bank_name) values ('Vsquare Systems', '001', '002', '030', '2020-01-31', '2020-01-31', 'Developer2', '$150', 152, 000000001, 'Payment Bank'); insert into timesheet (Company_Name, SOW_Number, Project_ID, Invoice_Number, Invoice_Date, Billing_Period, Developer, Rate, Hours, Bank_Account_Number, Bank_name) values ('Vsquare Systems', '001', '002', '030', '2020-01-31', '2020-01-31', 'Developer3', '$140', 168, 000000001, 'Payment Bank')

[ 312 ]

Building an AI application

Chapter 10

Now that we have the data ready, let's take a deep dive into the validation process. 3. Let's walk through the code that we need for data validation. In the following code, we extract data from the timesheet table that we created earlier and we create the data frame based on the Company Name, SOW_Number, Project_ID, Invoice_Number, and Developer: #Reading data from timesheet table. df_timesheet = pd.read_sql(f'SELECT * FROM timesheet', con=db_con) joined_df = pd.merge(df_invoice, df_timesheet, on=['Company_Name','SOW_Number','Project_ID','Invoice_Number','Invo ice_Date','Developer'])

In the following code, we are matching the invoice and timesheet data frame based on the rate and the hours. If all the records match, then we can move on; if there is a discrepancy, then the invoice will not be processed: #Matching data of both tables. matched = [] for index, row in joined_df.iterrows(): if row['Rate_x'] == row['Rate_y'] and row['Hours_x'] == row['Hours_y'] and row['Bank_Account_Number_x'] == row['Bank_Account_Number_y']: matched.append(True) else: matched.append(False)

Scheduling the invoice for the payment queue (pub/sub) In this section, we will learn how to push a message to a pub/sub topic for the invoice processing system to process the invoice. We will cover the process of creating a pub/sub topic, as well as how to publish and subscribe to messages from the topic.

[ 313 ]

Building an AI application

Chapter 10

Let's create a topic and test it: 1. To enable the Cloud Pub/Sub API, go through the following steps: 1. Search for the Pub/Sub API from the home page. 2. Enable the API. 2. To create a pub/sub topic and subscription, go through the following steps: 1. Select Pub/Sub from the left panel. 2. Click on Create Topic. 3. From the left panel, go to Subscriptions and create a subscription for the created topic. 3. From the option shown in the following screenshot, push the message into the topic:

4. The following code demonstrates that, once the data is matched, we push a message to the pub/sub topic that will be consumed by the payment processing system to process the invoice: #Pushing successful message into Pub/Sub. if False not in matched: data = f"Invoice Matched" publisher = pubsub_v1.PublisherClient() topic_path = publisher.topic_path('project_id', 'topic_name') bdata = data.encode("utf-8") future = publisher.publish(topic_path, data=bdata)

[ 314 ]

Building an AI application

Chapter 10

Notifying the vendor and AP team about the payment completion At the time of writing, GCP uses SendGrid to send emails. SendGrid is a trusted email delivery system that is used across the globe to send transnational and marketing emails. It is a very powerful cloud-based solution that takes out all the complexity of maintaining an email infrastructure. In this section, we will learn how to send emails using SendGrid. It comes with a free subscription of 12,000 emails. The following are the steps to enable the SendGrid API: 1. Search for Cloud Messaging from the home page. 2. Enable the SendGrid API. Now let's see how to configure and send emails from SendGrid. The following is the sample Python code that can be used to send emails from SendGrid. The code is very simple and self-explanatory. You just have to make sure that you are using the right API key: # using SendGrid's Python Library https://github.com/sendgrid/sendgrid-python import sendgrid ​ sg = sendgrid.SendGridClient("YOUR_SENDGRID_API_KEY") message = sendgrid.Mail() ​ message.add_to("[email protected]") message.set_from("[email protected]") message.set_subject("Sending with SendGrid is Fun") message.set_html("and easy to do anywhere, even with Python") sg.send(message)

The preceding code is used to configure the SendGrid client, recipient, sender, subject, and the message in the email body. Now you have an understanding of how to process an invoice in an automated fashion. Next, we will have a quick look at how to create a conversational interface for AIP.

[ 315 ]

Building an AI application

Chapter 10

Creating conversational interface for AIP Once the application is built with the data stored in Cloud SQL, we can enable a conversational interface by enabling the Dialogflow API. We can create the application context and train the agent to handle various user queries related to the invoices. We need an external integration service that connects to the data storage to answer various user queries based on the application context. We have already seen the detailed steps for enabling conversational interface with Dialogflow in Chapter 6, Smart Conversational Applications Using Dialogflow. By enabling the Dialogflow agent, the use case of automated invoice processing can fully complement human intelligence and avoid manual intervention and errors and can process large volumes of invoices faster and with high efficiency. In this chapter, we have seen a practical application of the various GCP tools and applications in building an intelligent application for automated invoice processing. As we have seen the development of capabilities that complement human intelligence is easy and seamless with the use of GCP tools. The platform itself is constantly evolving and moving toward the complete democratization of the high-end capabilities that leverage fully managed services for storage, compute, ML, and advanced analytics. In the final section of this chapter (and book), we will explore some of the upcoming features of GCP. We anticipate that, with constant innovation and Google's experience with working with the entire world's data, the upcoming tools, frameworks, and applications will make it even easier, quicker, and more cost-effective to build intelligent applications on GCP.

Upcoming features It is imperative that GCP evolves even further with the addition of new features. Here is a representative list of new additions that we think will be added to GCP in the future: GCP will have greater integration and availability of cross-platform products. For example, IBM Power Systems is now available on GCP. That way, the investments that have already been made by enterprises in large production systems will be utilized by migrating entire platforms onto GCP. This will result in implementation and infrastructure cost savings for enterprises.

[ 316 ]

Building an AI application

Chapter 10

GCP will be enabled with ready-to-use AI and ML models. As the marketplace matures, GCP will host additional and ever-increasing numbers of AI and ML models. These models will be available to use via predefined APIs with inherent interoperability. The models will be constantly trained and tuned by GCP and produce increasingly better results with time. The marketplace will mature with increased usage. The signing up and pricing will be simplified to the extent that developers at all levels of experience (including entry level), will be able to quickly build their enterprise applications. GCP will provide a drag-and-drop user interface for building entire AI pipelines, from problem classification to model deployment. At that point, the power of AI will be fully in the hands of business teams and there will be less dependence on IT and development teams. The simplification and democratization of the platform will result in even further innovation and we will experience intelligent applications that are not only used but that are built by everyone. GCP will enable industry- and business-specific AI toolkits for improved profitability and innovation for enterprises of all sizes. For example, Google is already helping retailers to accelerate their digital and multichannel revenue growth. Along with this, Google is also helping retailers to become fully-data driven and make suggestions (based on data) for improvements in operational efficiency. This is possible by leveraging the AI tools on GCP, Chrome Enterprise, and Android, as well as the entire connected toolkit. The AI toolkit on GCP will also facilitate research projects that require high volumes of data and computation power, as well as a process and interface for building the AI pipeline. For example, Google is helping FDA MyStudies in leveraging real-world data for biological research. Google Cloud is working with FDA on the MyStudies application with better and adaptable protection and configurable privacy policies. The goal is to provide research organizations with the ability to automatically identify and secure information that is personally identifiable. Google Cloud will continue to invest in various studies and research programs to bring general improvements to the platform, expand the number of assessments supported, and allow integration with downstream analytics and visualization tools.

[ 317 ]

Building an AI application

Chapter 10

AutoML Tables enables your entire data scientist team to automatically build and deploy ML models on structured data at a highly increased speed and scale. It comes with great feature engineering model training features. When training starts, AutoML tables automatically perform some feature engineering tasks, such as normalizing the inputs. The numeric features are confined to the ranges for better model reliability, normalization of date–time input parameters, basic text processing cleanup, and stop word removal, and create one-hot encoding and embedding for the dependent variables. The AutoML Tables perform parallel testing on linear, feed-forward deep neural networks, gradient-boosted decision trees, AdaNet, and ensembles of various model architectures to determine the best model architecture for your dataset. Users will be able to view the AutoML Table structure using StackDriver Logging and will be able to export test data as well. AI Hub is another very useful feature that is coming to GCP. AI Hub is a onestop facility for building even the most complex ML pipelines. It is possible to use preconfigured notebooks to build AI applications using some of the pretrained models, as well as to easily train new models. AI Hub also ensures access to relevent datasets in a consistent manner. It is also possible to collaborate on model development and leverage models built on common framework, such as TensorFlow. This greatly facilitates the training and deployment of models on GCP. AI Platform Notebooks will make it easy to manage JupyterLab instances through a protected, publicly available notebook instance URL. It will enable you to create and manage a virtual machine instance that is prepackaged with JupyterLab. AI Platform Notebooks instances will support PyTorch and TensorFlow frameworks. These notebooks will be protected by GCP authentication and authorization. AI Platform Notebooks will come with a lot of preinstalled commonly used software. AI Platform deep learning containers are a unique way in which GCP provides access to an array of pretrained models that can be quickly prototyped and used with the help of highly optimized and consistent environments on GCP. This helps in building workflows quickly and facilitates experiments with a minimal entry barrier and cost. This is a great leap toward fully democratizing AI development capabilities.

[ 318 ]

Building an AI application

Chapter 10

The AI Platform data labeling service is a great way to leverage human intelligence in labeling data points at internet scale. An organization can request this service from Google to label datasets manually. This is helpful in gathering the training and evaluation data when a new use case is considered and the initial dataset is not available. There is a consistent effort from Google to crowd source the process of dataset labeling on the internet. The labeling service is also handy when we want to deal with highly secure data that needs to be labeled. The interface with the labeling service is a secure and efficient way of getting data labeled.

Summary In this chapter, we learned a step-by-step approach to creating an AI application. We discussed the design and development of an automated invoice processing application. We used OCR with the Vision API to convert an invoice PDF into text, and then we used invoice data to validate timesheet data. After successful validation, we processed the invoice. We learned how to develop an end-to-end application on GCP using multiple Google services. Finally, we briefly discussed upcoming features of GCP.

[ 319 ]

Other Books You May Enjoy If you enjoyed this book, you may be interested in these other books by Packt:

Hands-On Machine Learning on Google Cloud Platform Giuseppe Ciaburro, V Kishore Ayyadevara, Et al ISBN: 978-1-78839-348-5 Learn how to clean your data and ready it for analysis Use Google Cloud Platform to build data-based applications for dashboards, web, and mobile Create, train and optimize deep learning models for various data science problems on big data Learn how to leverage BigQuery to explore big datasets Use Google’s pre-trained TensorFlow models for NLP, image, video and much more Create models and architectures for Time series, Reinforcement Learning, and generative models Create, evaluate, and optimize TensorFlow and Keras models for a wide range of applications

Other Books You May Enjoy

Cloud Analytics with Google Cloud Platform Sanket Thodge ISBN: 978-1-78883-968-6 Explore the basics of cloud analytics and the major cloud solutions Learn how organizations are using cloud analytics to improve the ROI Explore the design considerations while adopting cloud services Work with the ingestion and storage tools of GCP such as Cloud Pub/Sub Process your data with tools such as Cloud Dataproc, BigQuery, etc Over 70 GCP tools to build an analytics engine for cloud analytics Implement machine learning and other AI techniques on GCP

[ 321 ]

Other Books You May Enjoy

Leave a review - let other readers know what you think Please share your thoughts on this book with others by leaving a review on the site that you bought it from. If you purchased the book from Amazon, please leave us an honest review on this book's Amazon page. This is vital so that other potential readers can see and use your unbiased opinion to make purchasing decisions, we can understand what our customers think about our products, and our authors can see your feedback on the title that they have worked with Packt to create. It will only take a few minutes of your time, but is valuable to other potential customers, our authors, and Packt. Thank you!

[ 322 ]

Index A Accelerated Linear Algebra (XLA) 185 access control lists (ACLs) 32 Advanced-analytics-as-a-Service (AAaaS) 9 AI applications data acquisition 284 data processing 285 developing 281 model deployment 288, 290 problem classification 282 problem modeling 285 validation and execution 286 AI building blocks about 16 data 17 handling and control 20 image generation 21 information processing and reasoning 19 machine vision 19 Natural Language Processing (NLP) 18 navigation and movement 20 planning and exploring 20 speech generation 21 speech recognition 19 AI Notebooks execution automating 129, 130, 131, 132, 133 AI Platform Prediction 120 AI Platform training 120 AI tools, GCP about 21 conversation 24, 25 language 23 sight 21, 23 AIP application component design 293 anti-patterns, Cloud First strategy

downtime 12 security and privacy 12 APIs, for linguistic information and intelligence Cloud Natural Language API 23 Cloud Translation API 23 App Engine about 29 environments 29 using, for AI applications 29 Application-Specific Integrated Circuit (ASIC) 179 Artificial Intelligence (AI) 11 asynchronous batch prediction with Cloud Machine Learning Engine 145, 146, 147, 148, 149 audio sentiment analysis performing, with DialogFlow 171, 172 automated invoice processing (AIP) conversational interface, creating 316 designing, with AI platform tools on GCP 292, 295 overview 290, 291 AutoML Natural Language APIs used, for performing sentiment analysis 113, 114, 116, 117 AutoML Natural Language traditional machine learning approach, for document classification 79 used, for document classification 78 AutoML Vision APIs used, for image classification 90 AutoML working 74

B base-learning algorithm 54 bias 55 BigQuery

about 40, 41 data, loading 46, 47 usage 41 used, for training model 48, 49 using, for AI applications 41 built-in algorithms Linear Learner 198 Wide and Deep 198 XGBoost 199

C Central Processing Unit (CPU) 198 classification metrics Area under curve (AUC) 288 classification accuracy 287 confusion matrix 288 logarithmic loss 288 Classless Inter-Domain Routing (CIDR) 177 Cloud AutoML about 72 advantages 73 operations API 78 overview 75 REST source, used for model evaluation 77 REST source, used for pointing to model locations 75, 77 used, for document classification 79 Cloud Bigtable about 33 using, for AI applications 34 Cloud Dataflow about 42 using, with AI applications 43 Cloud Dataproc about 41, 42 using, with AI applications 42 Cloud Datastore about 34 using, for AI applications 35 Cloud Filestore about 35, 36, 39 using, for AI applications 36, 40 Cloud First strategy advantages 10, 11 anti-patterns 12

for advanced data analytics 9 Cloud Functions about 30, 294 used, for validating invoice 311 using, with AI applications 30 Cloud Machine Learning Engine about 120 asynchronous batch prediction 145, 146, 147, 148, 149 real-time prediction 150 using 121 Cloud Memorystore about 38, 39 using, for AI applications 39 Cloud ML Engine components 198 Data Labeling Service 219 deep learning containers 220 notebooks 215, 218 prediction service 210 training application, deploying 226, 228 training application, packaging 226, 228 training service model 198 Cloud SDK 294 Cloud Spanner about 37 using, for AI applications 38 Cloud SQL about 36, 37 used, for storing invoice 300 using, for AI applications 37 Cloud Storage about 32 data, loading 45, 46 using, for AI applications 33 Cloud TPUs about 175, 176, 178 arrays, tiling 191 fusion technique 192 model development, guiding principles 186 organization 175 performance analyzing, by setting up TensorBoard 189, 190 performance guide 191 usage, advantages 178, 179

[ 324 ]

used, for best practices of model development 185 XLA compiler performance 191 Cognitive Toolkit (CNTK) 135 Comma-Separated Values (CSV) 199 command-line interface (CLI) 15, 209, 253 components, AI toolkit Cloud Machine Learning 294 Cloud Video Intelligence API 295 Natural language API 294 Speech API 295 Translation API 295 Vision API 294 Compute Engine about 27, 28 using, with AI applications 28 compute options about 27 App Engine 29 Cloud Functions 30 Compute Engine 27, 28 Kubernetes Engine 30 selecting, for training job 229, 231 Computer Vision (CV) 73 conversational interface Cloud Speech-to-Text API 24 Cloud Text-to-Speech API 24 Dialogflow Enterprise Edition 24 Convolutional Neural Network (CNN) 178 cross-validation technique 287 custom images 122

D data capacity actions 18 processing 18 storage 17 Data Labeling Service 219 data loading, flow design 44 loading, to BigQuery 46, 47 loading, to Cloud Storage 45, 46 dataset test set 286 training set 286

validation set 286 deep neural networks (DNNs) 18 DialogFlow agent about 154 building 159, 161 configuring 162, 163, 164, 166, 167, 169 DialogFlow console reference link 159 DialogFlow context about 156 input context 156 output context 156 DialogFlow entity 155 DialogFlow events 157 DialogFlow fulfillment 158 DialogFlow intent about 154 action 155 parameters 155 responses 155 training phrases 154 DialogFlow about 153 audio sentiment analysis, performing 171, 172 fundamentals 153 supported use cases 170 document classification, with AutoML about 79 dataset, creating 81, 82 mode, used for predictions 88 model, evaluating 84 navigating, to AutoML Natural Language interface 80 training data, labeling 82

E end-user expression 154 ensemble learning about 54 deciding, on optimal predictive model 55 Everything-as-a-Service (XaaS) 15 eXtreme Gradient Boosting (XGBoost) features 60

[ 325 ]

F fallback intent 160 FLAC (Free Lossless Audio Codec) 105 follow-up intents 156

G GCP additional features 316, 319 AI tools 21 models, deploying 252, 254, 255 overview 15, 16 Google AI Platform deep learning images 121 model, training 141, 142, 143, 144 Google Cloud AI Platform Notebooks 121 Google Cloud Storage bucket saved models, uploading 255 Google data centers about 13, 14 multi-region resources 14 regional resources 14 zonal resources 14 Google Platform AI Notebooks creating 123, 124, 125, 126, 127 using 127, 128 Grade Point Average (GPA) 200 gradient boosting 57, 58 graphical user interface (GUI) 268 Graphics Processing Unit (GPU) 198 gRPC Remote Procedure Call (gRPC) 184 gsutil tool reference link 253

H high bandwidth memory (HBM) 180 holdout approach 286 hyperparameters selecting, for training job 231, 233

I Identity and Access Management (IAM) 212 image 122 image classification, AutoML Vision APIs about 91

command-line interface 100 dataset, creating 91 model, evaluating 98 model, testing 101 model, training 96, 98 Python code 100 Python code, for testing 102 training images, collecting 91 training images, labeling 93 training images, uploading 93, 95 image family 122 Infrastructure-as-a-Service (IaaS) 9 intent classification 154 Internet of Things (IoT) 38 internet of vehicles (IoV) 285 invoice, storing with Cloud SQL Cloud Function, creating 305, 307 Cloud Functions API, enabling 304 Cloud SQL Admin role, providing 309, 311 Cloud SQL API, enabling 304 Cloud SQL instance, creating 300 database and tables, setting up 302 irreducible errors 56

J JupyterLab 121 just-in-time (JIT) compiler 185

K Keras framework model, training 137, 138, 139, 140 Keras about 134 features 134 using 135, 136 Kubernetes Engine about 30 using, for AI applications 31

L logging, prediction service 212 loss function 57

[ 326 ]

M machine learning 119 machine learning models testing 256, 257 machine learning solution deploying 257 deploying, on GCP 252, 255 maintaining 239, 244 naming, rules 242 training, example 264, 266, 268, 270, 272, 274, 277 versions 239, 244, 257, 260, 262, 264 versions, naming 242 machine learning, on cloud advantages 119, 120 machine-based intelligent predictions overview 235 process 236, 237, 239 master node 209 Matplotlib 121 Matrix Unit (MXU) 180 mean square error (MSE) 57 ML pipeline building 43 model development best practices, with Cloud TPUs 185 model evaluation parameters (metrics) 287 model evaluation AuPRC (Area under Precision-Recall Curve) 84 command line 85 Confusion Matrix 84 Java code 86 metrics 84 Node.js implementation 88 Precision and Recall curves 84 Python code snippet 86 model predictions about 88 Python code for model predictions 90 web interface 89 model training parameters 99 model evaluating 49 testing 50

training, with BigQuery 48, 49 training, with Google AI Platform 144 multiply-accumulate (MAC) 181

N Natural Language Processing (NLP) 73 natural language understanding (NLU) 153 neural networks training, with Google AI Platform 141, 142, 143 training, with Keras framework 137, 138, 139, 140 training, with TPUEstimator 187 Not a Number (NaN) 69

O online prediction reference link 150 versus batch prediction 239 optical character recognition (OCR) performing, with vision API 296, 299, 300 overfitting 286

P parameter server 209 payment completion notification, sending 315 payment queue (pub/sub) invoice, scheduling 313 Peripheral Component Interconnect Express (PCIe) 185 Platform-as-a-Service (PaaS) 15 prediction service about 210, 215 batch prediction 211 online prediction 211 prediction performing, with service endpoints 278, 279 preemptible TPU about 193 creating, from console 193, 194, 195 detecting 196 pricing 195 problem categories, AI applications anomaly detection 284 classification 283

[ 327 ]

clustering 283 data preparation 284 optimization 283 ranking 284 regression 283 processing options 40 proof of concept (POC) 231 public images 122 Python client library reference link 67

Quality assurance (QA) 256

Streaming Recognition 104 Synchronous Recognition 104 used, for performing speech-to-text conversion 103 speech-to-text conversion asynchronous requests 111 performing, with Speech-to-Text API 103 streaming requests 111, 113 synchronous requests 104, 106, 108, 111 standard Python dependencies 228 Standard TensorFlow Estimator API 187 Stochastic Gradient Descent (SGD) 204 storage options 31

R

T

Q real-time prediction with Cloud Machine Learning Engine 150 recommendation system building, with XGBoost library 68, 69 reducible errors 55 Relational Database Management System (RDBMS) 31 replica 209 Representational State Transfer (REST) 207, 237

S saved models exploring 244 Scholastic Aptitude Test (SAT) 199 SciPy 121 Sendgrid 315 sentiment analysis performing, with AutoML Natural Language APIs 113, 115, 117 service endpoints used, for performing prediction 278, 279 Service Level Agreement (SLA) 38 software component blocks, TPU software architecture TensorFlow Client 184 TensorFlow Server 184 XLA Compiler 184 Software Development Kit (SDK) 221 Speech-to-Text API Asynchronous Recognition 104

Tensor Processing Units (TPUs) 11 TensorBoard setting up, for analyzing Cloud TPUs performance 189, 190 TensorFlow application creating 223 data, training 224, 226 project structure recommendation 223 TensorFlow Client 185 TensorFlow Estimator converting, to TPUEstimator 188 TensorFlow model prerequisites 221, 222 training 221 utilizing 221 TensorFlow SavedModel APIs 250, 251, 252 TensorFlow SavedModel SignatureDef 246, 248, 249 TensorFlow Server 185 TensorFlow training model jobs monitoring 233 TeraFlops (TFLOPS) 181 total error 56 TPU configuration 182, 183 TPU Estimator 185 TPU hardware architecture mapping 179 TPU software architecture about 184 mapping 179

[ 328 ]

TPU v2 TPU v3, performance benefits 181 TPU v3 performance benefits, over TPU v2 181 TPU version 180, 181 TPUEstimator concepts 188 programming model 187 Standard TensorFlow Estimator API 187 used, for training model 186 training service model about 198 built-in algorithms, using 198, 201, 202, 203, 205, 206, 207 custom training application, using 208, 210 types, TPU configuration single-device TPU 182 TPU pods 182

U Uniform Resource Identifiers (URIs) 199 user-defined and custom dependencies 228

V variance 55 visual information and intelligence APIs, GCP AutoML Vision 23 Cloud Video Intelligence API 22 Cloud Vision API 21

W welcome intent 159 workers 209

X XGBoost library overview 54 recommendation system, building 68, 69 XGBoost machine learning models storing 61, 62, 64 training 61, 62, 64 using 64, 65, 66 XGBoost recommendation system model creating 70 testing 70 XLA Compiler 185