Machine Learning with Python for PC, Raspberry Pi, and Maixduino 3895765023, 9783895765025

255 72 25MB

English Pages [248]

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Machine Learning with Python for PC, Raspberry Pi, and Maixduino 3895765023, 9783895765025

301 147 26MB Read more

Learning Python with Raspberry Pi 9781118717059, 9781118717035, 9781118717028, 1118717058

The must-have companion guide to the Raspberry Pi User Guide! Raspberry Pi chose Python as its teaching language of cho

926 100 82MB Read more

Learning Python with Raspberry Pi 9781118717059, 9781118717035, 9781118717028

441 87 5MB Read more

Learning Python with Raspberry Pi 9781118717059, 9781118717035, 9781118717028, 1118717058

The must-have companion guide to the Raspberry Pi User Guide! Raspberry Pi chose Python as its teaching language of cho

106 50 88MB Read more

IoT Machine Learning Applications in Telecom, Energy, and Agriculture: With Raspberry Pi and Arduino Using Python 9781484255490, 1484255496

Apply machine learning using the Internet of Things (IoT) in the agriculture, telecom, and energy domains with case stud

1,242 213 12MB Read more

Machine Learning with Python: A Practical Beginners’ Guide (Machine Learning with Python for Beginners Book 2)

Ready to add Machine Learning to your skill stack? As the second title in the Machine Learning From Scratch series, this

1,905 443 2MB Read more

Thoughtful Machine Learning with Python

3,236 623 6MB Read more

Raspberry Pi with Python A Guide for Beginners: Learn Raspberry Pi the easy way 9462565504, 9785848825, 9898989898, 8787878787, 7878787878

This book will helps you to learn about Raspberry Pi and some basics to learn Python Programming language. Knowledge of

220 80 728KB Read more

Advanced Machine Learning with Python

6,047 1,698 4MB Read more

Machine Learning with Python 1824213629

2,147 352 3MB Read more

Machine Learning with Python for PC, Raspberry Pi, and Maixduino
3895765023, 9783895765025

Table of contents :
Search…
Machine Learning with Python
All rights reserved.
Contents
Cautionary Notices
Program Downloads
1 • Introduction
1.1 "Super Intelligence" in three steps?
1.2 How machines can learn
2 • A Brief History of ML and AI
3 • Learning from "Big Data"
4 • The Hardware Base
5 • The PC as Universal AI Machine
5.1 The computer as a programming center
6 • The Raspberry Pi
6.1 The Remote Desktop
6.2 Using smartphones and tablets as displays
6.3 FileZilla
6.4 Pimp my Pi
7 • Sipeed Maix, aka "MaixDuino"
7.1 Small but mighty: the performance figures of the MaixDuino
7.2 A wealth of applications
7.3 Initial start-up and functional test
7.4 Power supply and stand-alone operation
8 • Programming and Development Environments
8.1 Thonny — a Python IDE for beginners and intermediates
8.2 Thonny as a universal IDE for RPi and MaixDuino
8.3 Working with files
8.4 Thonny on the Raspberry Pi
8.5 Tips for troubleshooting the Thonny IDE
8.6 The MaixPy IDE
8.7 A MicroPython interpreter for MaixDuino
8.8 The Flash tool in action
8.9 Machine Learning and interactive Python
8.10 Anaconda
8.11 Jupyter
8.12 Installation and Start-Up
8.13 Using MicroPython Kernels in Jupyter
8.14 Communication setup to the MaixDuino
8.15 Kernels
8.16 Working with Notebooks
8.17 All libraries available?
8.18 Using Spyder for Python Programming
8.19 Who's programming who?
9 • Python in a Nutshell
9.1 Comments make your life easier
9.2 The print() statement
9.3 Output to the display
9.4 Indentations and Blocks
9.5 Time Control and Sleep
9.6 Hardware under control: digital inputs and outputs
9.7 For vital values: variables and constants
9.8 Numbers and variable types
9.9 Converting number types
9.10 Arrays as a basis for neural networks
9.11 Operators
9.12 Conditions, branches and loops
9.13 Trial and error — try and except
10 • Useful Assistants: Libraries!
10.1 MatPlotLib as a graphics artist
10.2 The math genius: Numpy
10.3 Data-mining using Pandas
10.4 Learning and visualization using Scikit, Scipy, SkImage & Co.
10.5 Machine Vision using OpenCV
10.6 Brainiacs: KERAS and TensorFlow
10.7 Knowledge transfer: sharing the learning achievements
10.8 Graphical representation of network structures
10.9 Solution of the XOR problem using KERAS
10.10 Virtual environments
11 • Practical Machine Learning Applications
11.1 Transfer functions and multilayer networks
11.2 Flowers and data
11.3 Graphical representations of data sets
11.4 A net for iris flowers
11.5 Training and testing
11.6 What's blossoming here?
11.7 Test and learning behavior
12 • Recognition of Handwritten Numbers
12.1 "Hello ML" — the MNIST data set
12.2 A neural network reads digits
12.3 Training, tests and predictions
12.4 Live recognition of digits
12.5 KERAS can do even better!
12.6 Convolutional networks
12.7 Power training
12.8 Quality control — an absolute must!
12.9 Recognizing live images
12.10 Batch sizes and epochs
12.11 MaixDuino also reads digits
13 • How Machines Learn to See: Object Recognition
13.1 TensorFlow for Raspberry Pi
13.2 Virtual environments in action
13.3 Using a Universal TFlite Model
13.4 Ideal for sloths: clothes-sorting
13.5 Construction and training of the model
13.6 MaixDuino recognizes 20 objects
13.7 Recognizing, counting and sorting objects
14 • Machines Learn to Listen and Speak
14.1 Talk to me!
14.2 RPi Learns to talk
14.3 Talking instruments
14.4 Sorry, didn't get you ...
14.5 RPi as a ChatBot
14.6 From ELIZA to ChatterBots
14.7 The Talking Eye
14.8 An AI Bat
15 • Facial Recognition and Identification
15.1 The right to your own image
15.2 Machines recognize people and faces
15.3 MaixDuino as a Door Viewer
15.4 How many guest were at the party?
15.5 Person-detection alarm
15.6 Social minefields? — face identification
15.7 Big Brother RPi: face identification in practice
15.8 Smile, please ;-)
15.9 Photo Training
15.10 "Know thyself!" … and others
15.11 A Biometric scanner as a door opener
15.12 Recognizing gender and age
16 • Train Your Own Models
16.1 Creation of a model for the MaixDuino
16.2 Electronic parts recognition with the MaixDuino
16.3 Performance of the trained network
16.4 Field test
16.5 Outlook: Multi-object detectors
17 • Dreams of the Future: from KPU to Neuromorphic Chips
18 • Electronic Components
18.1 Breadboards
18.2 Wires and jumpers
18.3 Resistors
18.4 Light-emitting diodes (LEDs)
18.5 Transistors
18.6 Sensors
18.7 Ultrasound range finder
19 • Troubleshooting
20 • Buyers Guide
21 • References; Bibliography
Index

Citation preview

books books books books

Nahezu alle Menschen werden zunehmend mitapplications den Anwendungen Most people are increasingly confronted with the of Artificial der „Künstlichen (KI oder AI für engl. Artificial Intelligence) Intelligence (AI).Intelligenz“ Music or video ratings, navigation systems, shopping konfrontiert. Musikoder Videoempfehlungen, Navigationssysteme, advice, etc. are based on methods that can be attributed to this field. Einkaufsvorschläge etc. basieren auf Verfahren, die diesem Bereich zugeordnet werdenIntelligence können. The term Artificial was coined in 1956 at an international conference known as the Dartmouth Summer Research Project. One Der Begriff „Künstliche 1956 auf einer internationalen basic approach was toIntelligenz“ model thewurde functioning of the human brain and Konferenz, Dartmouth Summer Research Eine to constructdem advanced computer systems based Project on this. geprägt. Soon it should grundlegende Idee war dabei, Funktionsweise be clear how the human mind die works. Transferring itdes to amenschlichen machine was Gehirns zuonly modellieren basierend considered a small step.und Thisdarauf notion proved to be afortschrittliche bit too optimistic. Computersysteme zu konstruieren. Bald sollteitsklar sein, wiecalled der Nevertheless, the progress of modern AI, or rather subspecialty menschliche Verstand funktioniert. Die Übertragung Machine Learning (ML), can no longer be denied. auf eine Maschine wurde nur noch als ein kleiner Schritt angesehen. Diese Vorstellung erwies etwasdiff zuerent optimistisch. sind In this sich book,als several systems Dennoch will be used todie getFortschritte to know the der modernen KI, beziehungsweise ihrem Teilgebiet demtosogenannten methods of machine learning in more detail. In addition the PC, both „Machine Learning“, mehr zu übersehen. the Raspberry Pi andnicht the Maixduino will demonstrate their capabilities in the individual projects. In addition to applications such as object and Um dierecognition, Methoden des Machine Learnings näher kennenzulernen, sollen facial practical systems such as bottle detectors, person incounters, diesem Buch mehrereeye” verschiedene or a “talking will also beSysteme created.zum Einsatz kommen. Neben dem PC werden sowohl der Raspberry Pi als auch der „Maixduino“ inThe den einzelnen Projekten ihre Fähigkeiten zu latter is capable of acoustically describingbeweisen. objects orZusätzlich faces that are Anwendungen wie ObjektGesichtserkennung detected automatically. Forund example, if a vehicle is inentstehen the field ofdabei view auch einsetzbare Systeme wie etwa Flaschendetektoren, of thepraktisch connected camera, the information "I see a car!" is output via Personenzähler oder ein „Sprechendes electronically generated speech. SuchAuge“. devices are highly interesting examples of how, for example, blind or severely visually impaired people Letzteres ist in derfrom Lage, erkannte Objekte oder Gesichter can also benefit AI automatisch systems. akustisch zu beschreiben. Befindet sich beispielsweise ein Fahrzeug im Sichtfeld der angeschlossenen Kamera, so wird die Information „I see a car!“ über elektronisch erzeugte Sprache ausgegeben. Derartige Geräte sind hochinteressante Beispiele dafür, wie etwa auch blinde oder stark sehbehinderte Menschen von KI-Systemen profitieren können.

Dr. seit Dr.Günter GünterSpanner Spannerist has been über 20 Jahren Bereich der working in the im field of electronics Elektronikentwicklung und der development and physics Physikalischen technology for Technologie various largefür verschiedene corporations Großkonzerne for more than 20 tätig. Neben seinertoTätigkeit years. In addition his workals Dozent hat er sehr erfolgreich as a lecturer, he has published Fachartikel Bücher zum Thema successful und technical articles and Elektronik, Halbleitertechnik und books on electronics, semiconductor Mikrocontroller entlicht sowie technology andveröff microcontrollers, Kurse undcreated Lernpakete zu diesen and has courses and Themen learningerstellt. packages on these topics.

Elektor GmbH Media. b.v. ElektorVerlag International www.elektor.de www.elektor.com

Machine Learning with Python • Günter Spanner

Machine Machine Learning Learning mit withPython Python für ForPC, PC,Raspberry RaspberryPiPi, und and Maixduino Maixduino

Günter Spanner • Machine Learning mit Python

books

Machine Python Machine Learning Learningwith mit Python For PC, PC, Raspberry Raspberry Pi, für Pi and Maixduino Maixduino und

Raspberry Pi

Maixduino

Günter Spanner

Machine Learning with Python.indd Alle pagina's

10-03-2022 14:28

Machine Learning with Python For PC, Raspberry Pi, and MaixDuino

● Dr. Gunter Spanner

Machine Learning with Python 220224 UK.indd 3

23/03/2022 11:52

● This is an Elektor Publication. Elektor is the media brand of Elektor International Media B.V.

PO Box 11, NL-6114-ZG Susteren, The Netherlands Phone: +31 46 4389444

● All rights reserved. No part of this book may be reproduced in any material form, including photocopying, or

storing in any medium by electronic means and whether or not transiently or incidentally to some other use of this publication, without the written permission of the copyright holder except in accordance with the provisions of the Copyright Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licencing Agency Ltd., 90 Tottenham Court Road, London, England W1P 9HE. Applications for the copyright holder's permission to reproduce any part of the publication should be addressed to the publishers.

● Declaration

The Author and Publisher have used their best efforts in ensuring the correctness of the information contained in this book. They do not assume, and hereby disclaim, any liability to any party for any loss or damage caused by errors or omissions in this book, whether such errors or omissions result from negligence, accident, or any other cause. All the programs given in the book are Copyright of the Author and Elektor International Media. These programs may only be used for educational purposes. Written permission from the Author or Elektor must be obtained before any of these programs can be used for commercial purposes.

● British Library Cataloguing in Publication Data

A catalogue record for this book is available from the British Library

● ISBN 978-3-89576-502-5 ISBN 978-3-89576-503-2

Print eBook

● © Copyright 2022: Elektor International Media B.V. Editor: Jan Buiting

Prepress Production: D-Vision, Julian van den Berg

Elektor is part of EIM, the world's leading source of essential technical information and electronics products for pro engineers, electronics designers, and the companies seeking to engage them. Each day, our international team develops and delivers high-quality content - via a variety of media channels (including magazines, video, digital media, and social media) in several languages - relating to electronics design and DIY electronics. www.elektormagazine.com

●4

Machine Learning with Python 220224 UK.indd 4

23/03/2022 11:52

Contents Cautionary Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Program Downloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Chapter 1 • Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.1 "Super Intelligence" in three steps? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.2 How machines can learn. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Chapter 2 • A Brief History of ML and AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Chapter 3 • Learning from "Big Data" . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1 Machine Learning and Artificial Intelligence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Chapter 4 • The Hardware Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Chapter 5 • The PC as Universal AI Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.1 The computer as a programming center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Chapter 6 • The Raspberry Pi. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 6.1 The Remote Desktop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 6.2 Using smartphones and tablets as displays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 6.3 FileZilla . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 6.4 Pimp my Pi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Chapter 7 • Sipeed Maix, aka "MaixDuino" . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 7.1 Small but mighty: the performance figures of the MaixDuino . . . . . . . . . . . . . . . . 34 7.2 A wealth of applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 7.3 Initial start-up and functional test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 7.4 Power supply and stand-alone operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Chapter 8 • Programming and Development Environments . . . . . . . . . . . . . . . . . . 41 8.1 Thonny — a Python IDE for beginners and intermediates . . . . . . . . . . . . . . . . . . . 41 8.2 Thonny as a universal IDE for RPi and MaixDuino. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44 8.3 Working with files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 8.4 Thonny on the Raspberry Pi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 8.5 Tips for troubleshooting the Thonny IDE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 8.6 The MaixPy IDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 8.7 A MicroPython interpreter for MaixDuino. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 8.8 The Flash tool in action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 8.9 Machine Learning and interactive Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 8.10 Anaconda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

●5

Machine Learning with Python 220224 UK.indd 5

23/03/2022 11:52

Machine Learning with Python 8.11 Jupyter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 8.12 Installation and Start-Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 8.13 Using MicroPython Kernels in Jupyter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 8.14 Communication setup to the MaixDuino . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 8.15 Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 8.16 Working with Notebooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 8.17 All libraries available?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 8.18 Using Spyder for Python Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 8.19 Who's programming who?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Chapter 9 • Python in a Nutshell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 9.1 Comments make your life easier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 9.2 The print() statement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 9.3 Output to the display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 9.4 Indentations and Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 9.5 Time Control and Sleep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 9.6 Hardware under control: digital inputs and outputs . . . . . . . . . . . . . . . . . . . . . . . 80 9.7 For vital values: variables and constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 9.8 Numbers and variable types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 9.9 Converting number types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 9.10 Arrays as a basis for neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 9.11 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 9.12 Conditions, branches and loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 9.13 Trial and error — try and except . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Chapter 10 • Useful Assistants: Libraries! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 10.1 MatPlotLib as a graphics artist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 10.2 The math genius: Numpy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 10.3 Data-mining using Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 10.4 Learning and visualization using Scikit, Scipy, SkImage & Co. . . . . . . . . . . . . . . 102 10.5 Machine Vision using OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 10.6 Brainiacs: KERAS and TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 10.7 Knowledge transfer: sharing the learning achievements . . . . . . . . . . . . . . . . . . 113 10.8 Graphical representation of network structures . . . . . . . . . . . . . . . . . . . . . . . . 114

●6

Machine Learning with Python 220224 UK.indd 6

23/03/2022 11:52

10.9 Solution of the XOR problem using KERAS . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 10.10 Virtual environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Chapter 11 • Practical Machine Learning Applications . . . . . . . . . . . . . . . . . . . . . 119 11.1 Transfer functions and multilayer networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 11.2 Flowers and data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 11.3 Graphical representations of data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 11.4 A net for iris flowers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 11.5 Training and testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 11.6 What's blossoming here? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 11.7 Test and learning behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Chapter 12 • Recognition of Handwritten Numbers . . . . . . . . . . . . . . . . . . . . . . . 133 12.1 "Hello ML" — the MNIST data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 12.2 A neural network reads digits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 12.3 Training, tests and predictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 12.4 Live recognition of digits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 12.5 KERAS can do even better! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 12.6 Convolutional networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 12.7 Power training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 12.8 Quality control — an absolute must! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 12.9 Recognizing live images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 12.10 Batch sizes and epochs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 12.11 MaixDuino also reads digits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Chapter 13 • How Machines Learn to See: Object Recognition . . . . . . . . . . . . . . . 154 13.1 TensorFlow for Raspberry Pi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 13.2 Virtual environments in action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 13.3 Using a Universal TFlite Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 13.4 Ideal for sloths: clothes-sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 13.5 Construction and training of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 13.6 MaixDuino recognizes 20 objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 13.7 Recognizing, counting and sorting objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Chapter 14 • Machines Learn to Listen and Speak. . . . . . . . . . . . . . . . . . . . . . . . . 172

●7

Machine Learning with Python 220224 UK.indd 7

23/03/2022 11:52

Machine Learning with Python 14.1 Talk to me! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 14.2 RPi Learns to talk. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 14.3 Talking instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 14.4 Sorry, didn't get you ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 14.5 RPi as a ChatBot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 14.6 From ELIZA to ChatterBots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 14.7 The Talking Eye . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 14.8 An AI Bat

. . . . . . . . . . . . . . . . . . . . . . . . . . 189

Chapter 15 • Facial Recognition and Identification . . . . . . . . . . . . . . . . . . . . . . . . 192 15.1 The right to your own image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 15.2 Machines recognize people and faces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 15.3 MaixDuino as a Door Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 15.4 How many guest were at the party?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 15.5 Person-detection alarm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 15.6 Social minefields? — face identification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 15.7 Big Brother RPi: face identification in practice . . . . . . . . . . . . . . . . . . . . . . . . . 204 15.8 Smile, please ;-) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 15.9 Photo Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 15.10 "Know thyself!" … and others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 15.11 A Biometric scanner as a door opener . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 15.12 Recognizing gender and age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Chapter 16 • Train Your Own Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 16.1 Creation of a model for the MaixDuino . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 16.2 Electronic parts recognition with the MaixDuino . . . . . . . . . . . . . . . . . . . . . . . . 222 16.3 Performance of the trained network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 16.4 Field test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 16.5 Outlook: Multi-object detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 Chapter 17 • Dreams of the Future: from KPU to Neuromorphic Chips . . . . . . . . . 230 Chapter 18 • Electronic Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 18.1 Breadboards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 18.2 Wires and jumpers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

●8

Machine Learning with Python 220224 UK.indd 8

23/03/2022 11:52

18.3 Resistors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 18.4 Light-emitting diodes (LEDs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 18.5 Transistors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 18.6 Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 18.7 Ultrasound range finder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 Chapter 19 • Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Chapter 20 • Buyers Guide. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 Chapter 21 • References; Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

●9

Machine Learning with Python 220224 UK.indd 9

23/03/2022 11:52

Machine Learning with Python

Cautionary Notices 1. The circuits and boards (Raspberry Pi, MaixDuino) in this book may only be operated with approved, double-insulated AC power supplies. Insulation faults in a simple power supply unit can result in life-threatening voltages on non-insulated components. 2. Powerful LEDs can cause eye damage. Never look directly into an LED! 3. Neither the Author nor the Publisher assume any liability for any damage, loss, or injury caused by, or as a result of, setting up the projects described. 4. Electronic circuits can emit electromagnetic interference. Neither the Publisher nor the Author have any control over the technical implementations of the user. Consequently, the user is responsible for compliance with the relevant emission limit values.

● 10

Machine Learning with Python 220224 UK.indd 10

23/03/2022 11:52

Program Downloads

Program Downloads The programs discussed in this book can be downloaded free of charge from the Elektor Store website: www.elektor.com Once at the Elektor Store, enter the book title in the Search box. Once at the book resources and info page, click on the Downloads tab. Download the .zip archive file, save it locally on your computer, then extract it. If a program is not identical to the one described in the book, the version from the download package should be used, as this is the most current one.

● 11

Machine Learning with Python 220224 UK.indd 11

23/03/2022 11:52

Machine Learning with Python

Chapter 1 • Introduction We are increasingly confronted with the applications of "Artificial Intelligence" (AI): recommendations for music or video, navigation systems, shopping suggestions and so on are based on procedures that can be more or less assigned to this area. However, there is often a lack of clarity regarding the terms • • • •

Artificial Intelligence (AI) Machine Learning (ML) Deep Learning (DL) Neural Networks (NN)

One of the most frequent queries to popular search engines is therefore: "What is the actual difference between AI and Machine Learning?" Although the terms are closely related, they are by no means synonymous: Artificial Intelligence is, in principle, the general knowledge base. It forms the foundation, just as mathematics is the basis for physics or electrical engineering. AI tries to solve problems without rigid algorithms using highly developed machines. Machine Learning is a branch of artificial intelligence. The focus is on automatic learning from experience, i.e., without a special program being created for it. There are various methods for this, for example neural networks or general statistical methods. Deep Learning or Deep Neural Learning is again a subset of machine learning. Here, with the help of neural networks, methods and systems get developed that are modeled on biological brains. The term Artificial Intelligence was coined in 1956 at an international conference called The Dartmouth Summer Research Project. A fundamental idea was to model the human brain and to construct more advanced computer systems based on this knowledge. It was believed that it would soon transpire how the human mind works. The transfer to a machine should then only be a small step. This turned out to be slightly overoptimistic. Nevertheless, some important findings have been gathered at the conference. It was stated, for example, that the key factors for an intelligent machine should be • independent learning; • natural language processing; • creativity. Wisdom, creativity and social skills are terms that hardly apply to machines up to now. Rather, the tasks that AI scientists are currently concentrating on are:

● 12

Machine Learning with Python 220224 UK.indd 12

23/03/2022 11:52

Chapter 1 • Introduction

• • • • • • • •

speech recognition and machine language translation; medical diagnosis such as evaluation of x-rays or computer tomography scans; Internet search engines; optimization of sales strategies; scientific and technical applications, in astronomy and geophysics etc.; forecast of stock prices and foreign exchange rates; handwriting recognition; personal identification using face and fingerprint recognition.

All of these applications require a certain number of skills originally reserved to humans. But whether this is already "intelligence" is often questioned. It therefore makes sense to take a look at what the term intelligence should actually include.

1.1 "Super Intelligence" in three steps? First of all, artificial intelligence can be divided into two areas: weak and strong AI. The "strong" or general AI is essentially the attempt to create artificial human beings. The goal would be machines or robots that have all the mental powers of humans, including a consciousness [12]. A "weak" AI, on the other hand, can perform only certain tasks very well, but does not even come close to human capabilities in other areas. Examples of weak AI include computers that can beat people at chess — not only average players but also world chess champions. Weak artificial intelligence is nowadays widespread in science, business, and healthcare. For example, AI enterprise systems contain powerful analytical tools and can provide recommendations and insights for business development. They are used for risk management and planning. Strong AI begins as soon as machines become "human", make their own decisions, or learn new skills without human help. These systems are not only able to solve classical mathematical or logical problems, but also have certain emotional abilities. It is quite easy to program machines to perform some "emotional" or "verbal" actions in response to certain stimuli. ChatBots and virtual assistants are for example already quite good at having a conversation. Some experiments in which robots learn to read human emotions have also been carried out successfully. However, these efforts were more concerned with producing emotional responses than real mental competences. The next step would be a so-called "super intelligence". This is a name given to systems or machines that are superior to humans in many or even all areas of intelligence. It includes all, or at least many, areas of the intellect, in particular, creative and problem-solving skills or emotional and social skills. Self-confidence or a comprehensive memory are also frequently considered prerequisites for real super intelligence. It does not matter whether these functions are based on biological or technical principles. However, considerations about a Super Intelligence quickly led to dystopian scenarios. Once an artificial intelligence has been created that can continue to learn completely independently, the step to a superhuman level of intelligence is ultimately only a matter of

● 13

Machine Learning with Python 220224 UK.indd 13

23/03/2022 11:52

Machine Learning with Python

time. The controllability of such an AI could ultimately become an existential question for all of humanity. Scientists who work in this area and make a good living from it, are downplaying this danger. This is certainly all too human. It is therefore important that as many people as possible deal with these topics. Only in this way, a realistic assessment of the potentials and risks will be possible in the end.

1.2 How machines can learn Machine learning is part of the broad field of artificial intelligence that focuses on how to teach systems to learn without being programmed to perform specific tasks. The key idea behind ML is that it is possible to develop machines that independently learn from data, generate knowledge and make useful predictions. To get closer to this goal, three requirements must be met: 1. Availability of extensive data sets These can consist of numbers, images, text, audio samples or other information. Creating the data sets is often the most extensive part of creating an AI system. 2. Formulation of a learning objective In so-called supervised learning, the training data contains "correct" and "incorrect" solutions. These are marked accordingly. During the learning process, the system learns how to make correct assessments. In the case of unsupervised learning, on the other hand, the system must decide by itself in which categories a data record can be divided. A typical example is a data set with pictures of people. Various systems have already succeeded in independently recognizing that they can be divided into men and women. 3. An adaptive system In order to achieve the best performance, different procedures are often combined. At the beginning of the learning phase, the ML system only contains a large number of default parameters. With an appropriate algorithm, the parameters are varied until the system delivers results that are as correct as possible. Although neural networks are a widespread variant here, they are not the only possibility for the development of adaptive systems. Deep learning is a machine-learning method that was inspired by the structure of a human brain. With deep learning algorithms, complex, multilayered neural networks can be trained efficiently. In particular, the "back propagation" algorithm is considered to be one of the most promising approaches. In neural networks, information is transmitted via virtual connection channels. During a learning or "training" session the connection strengths of these channels are optimized. Using a huge amount of data, a large number of parameters are determined, which ultimately form a ready-to-use system.

● 14

Machine Learning with Python 220224 UK.indd 14

23/03/2022 11:52

Chapter 1 • Introduction

Not only numerical or visual data can be processed. Speech recognition systems can also be implemented in this way. The soundwaves can be displayed as spectrograms to which a neural network can assign certain words. However, there is currently no perfect, universal algorithm that works equally well for all tasks. There are actually four main groups of ML algorithms. In addition to monitored and unsupervised learning, a distinction is made between semi-monitored and "reinforcement learning". Each variant has its own strengths and weaknesses. Supervised learning is commonly used for classification and regressions. Application examples are spam filtering, speech recognition, computer vision, image recognition and classification. In the case of unsupervised learning, initially the program is not given any pre-classification. The system automatically divides the samples into groups. Unsupervised learning is often used to break down data based on certain similarities. The method is therefore well suited for complex data analysis. This enables machines to find patterns that humans cannot recognize due to their inability to process huge amounts of data. This variant is suitable, among other things, for discovering fraudulent financial transactions, forecasting sales or analyzing customer preferences. The classic applications are data segmentation, detection of anomalies, customer recommendation systems, risk management and the detection of fake data or images. In semi-supervised learning, there is a mixture of labeled and unlabeled data. In principle, certain prediction results are known, but the model can and may also find its own patterns in order to find a certain date structure and optimize predictions. Reinforcement learning comes closest to its human counterpart. People don't need constant supervision to learn effectively. Rather, new insights are drawn from positive or negative experiences. Touching a hot saucepan or riding a bicycle over a pin or nail led to a corresponding learning success. Training with ready-made data sets is no longer necessary. Instead, the system can learn in dynamic, real-world environments. In a first step, simulated worlds are often used in intensified learning research. These offer ideal, data-rich environments. The achievable scores in video games are ideally suited to train reward-motivated behaviors. The main applications are autopilot systems, self-driving cars and trains or autonomous robots [12]. In some areas these methods have already been used to develop systems that are superior to humans. For example, when diagnosing X-ray images, certain types of cancer are already better recognized by ML systems than by trained X-ray specialists. In addition, unlike the doctors, these machines can carry out routine evaluations 24 hours a day, 7 days a week.

● 15

Machine Learning with Python 220224 UK.indd 15

23/03/2022 11:52

Machine Learning with Python

Chapter 2 • A Brief History of ML and AI Although the term artificial intelligence first appeared at Dartmouth, the question of whether machines can really think is much older. In a famous paper entitled As We May Think, a system was proposed as early as 1945 that would model knowledge processing on human ways of thinking. A little later, Alan Turing published a technical article about how machines could take on tasks that require a high degree of "intelligence". In this context, chess was also explicitly mentioned. Nowadays, the exorbitant computing power of modern computers is undisputed. Nevertheless, it is doubtful whether a machine can really "think intelligently". Even the lack of an exact definition of the term "intelligence" is problematic. Moreover, the most important advances in the field of AI are not even recognizable to many people. The new methods are often used in very subtle ways. For example, when it comes to investigating customer behavior and, subsequently, influencing purchasing decisions, there is no interest in public education in this regard. Furthermore, there is a certain tendency to constantly redefine what "intelligent" means. When machines have solved a problem, afterwards it is quickly regarded as trivial or as plain computing power. For example, chess — the "Game of Kings" — was considered for centuries to be a strategy game requiring massive intelligence. After the chess world champion was defeated, the game was considered nothing more than "computing power". The future of AI becomes easier to understand if we look at some important points in its history. After the Dartmouth Conference, one of the next milestones was a computer program capable of communication with humans through a keyboard and a screen. ELIZA surprised many people with the illusion of a human conversation partner. Expert systems were finally used for the first medical applications. Highly-developed computer programs supported medics in diagnoses and routine analyses. From 1986 onward, computers slowly learned to speak. Using predefined phonetic sequences, machines were able to pronounce whole words and sentences intelligibly for the first time. Finally, further technological developments paved the way for artificial intelligence in everyday life. Powerful processors in smartphones and tablets offer consumers extensive AI applications. Apple's Siri, Microsoft's Cortana software, and Amazon's Alexa are conquering the markets. Since 2018, AI systems have been discussing space travel and arranging hairdresser's appointments. Finally, a system called AlphaGo managed to beat the reigning world champion in the game of Go. This Asian board game has significantly more variants than chess. With the pure pre-calculation of possible moves, the chances of success are therefore extremely low. With the help of AI methods, however, the seemingly impossible was achieved. A very special move attracted particular attention. Much later, it became clear how brilliant it was. Many experts were already talking about creativity and intuition on the part of the machine.

● 16

Machine Learning with Python 220224 UK.indd 16

23/03/2022 11:52

Chapter 2 • A Brief History of ML and AI

A little later, a successor system learned the game without any human support. This means that self-learning machines have reached a new dimension. The extensive training with human partners was not required anymore. The new system learned the basics and intricacies of the game, and even developed its own strategies. Both in chess and Go, all information is available to everyone involved. Poker, on the other hand, is a game with incomplete or hidden information. This gives "intuition" and the assessment of competing players a special meaning. Nevertheless, the machines achieved decisive success here as well, by winning important poker tournaments. AI systems are also able now to learn the game of poker independently from human training partners. Some even discovered the ability to bluff by themselves. The knowledge gained in poker is also of interest in other areas such as medicine or professional negotiation techniques. Here too, intuition often plays an important role. Therefore, medical students are usually inferior to experienced medics in diagnosing diseases. Experience and intuition are also required in finance and investment. This is why there are increasing applications in these areas for the results gained from the AI's poker victories. The development of AI was by no means free from setbacks and crises. Among other topics, this was demonstrated by the so-called XOR problem. The XOR or "exclusive or" gate is a classic problem in digital technology. The task is to construct a network element that emulates an XOR logic gate. This gate delivers a logic one at the output if the two inputs are not equal. If, on the other hand, the inputs are equal, a logic zero is occurs at the output. In tabular form: Input 1

Input 2

Output

0

0

0

0

1

1

1

0

1

1

1

0

At first glance, the XOR gate seems to be a very simple problem. However, it turned out that this functionality poses a major problem for neural network architectures. So-called "perceptrons" are elementary units that correspond to biological neurons. They therefore form the basis of artificial neural networks. Each preceptron can receive an input from other units and calculate a certain output. Typically, the weighted sum of all received values is formed and a decision is made as to whether a signal is to be passed on to other units. However, the XOR problem cannot be linearly separated. This can best be demonstrated when the XOR input values are plotted on a graph. As can be seen from Figure 2.1, there is no way to separate the predictions by a single straight classification line. Therefore, a perceptron can only separate classes that can be divided by straight lines.

● 17

Machine Learning with Python 220224 UK.indd 17

23/03/2022 11:52

Machine Learning with Python

Figure 2.1: The XOR problem. Hence, the XOR problem cannot be solved with a perceptron. The other logic operators AND, OR and NOT could be emulated without any problems, but not the XOR function! This realization plunged AI development into a deep crisis. It took several years until research groups seriously addressed the topic of neural networks and machine learning again. The solution to the problem is to go beyond the single-layer architecture. Additional layers of "neurons", in particular so-called hidden layers, finally made it possible to solve the XOR problem. How this solution looks in detail is explained in section 10.9. Despite its long, colorful history and considerable success rate, artificial intelligence is still not mature in many areas. Many applications need to become much more dependable or fault-tolerant. Sensitive areas such as autonomous driving or medical diagnoses require the highest degree of reliability. In addition, AI systems must become increasingly transparent. All decisions must remain comprehensible to human beings. Otherwise, the AI-based decision on granting a loan or applying a certain a medical therapy will never really be accepted in everyday life.

● 18

Machine Learning with Python 220224 UK.indd 18

23/03/2022 11:52

Chapter 3 • Learning from "Big Data"

Chapter 3 • Learning from "Big Data" One of the important tasks in the research area of Machine Learning is to analyze, recognize or interpret all kind of data. In most cases, a large accumulation of randomly distributed data is more or less useless. Only the categorization and division into meaningful groups, statistical evaluations or trend analyses etc., give the data a certain practical use. One goal of machine learning is therefore to transform data into knowledge and draw useful conclusions from it. For a long time, the application of ML methods was reserved to a small group of experts. The algorithms used were complex and hardly understandable for laymen. It was only through the development of open-source libraries that non-experts were given the opportunity to deal with AI topics and machine learning in more detail. This made AI procedures and data structures accessible to a wide range of users. The Python programming language in particular made it possible to embed extensive functions in simple "wrappers". With the help of uncomplicated Python instructions, today nearly anyone can use the powerful algorithms. No supercomputers are required. Rather, even small systems can recognize patterns in large amounts of data or derive predictions about future events and developments [2]. In the next section, the basic structure of artificial intelligence and machine learning will be presented. In addition, the associated terminology will be explained in more detail. This will provide the basics for solving practical applications. In particular, the following topics will play a major role: • Different variants of machine learning and AI. • Learning systems, neural networks and training methods. • Installation of a development environment for the programming language "Python" or its interactive variant IPython. • Creating a software basis for ML methods and procedures.

3.1 Machine Learning and Artificial Intelligence The research fields of artificial intelligence are closely related. It is often difficult to separate the individual sub-areas from each other. In addition to machine learning methods, there are other areas such as classical expert systems or so-called evolutionary algorithms. However, many of these different regions have lost some of their importance in recent years, as current developments are increasingly focusing on neural networks and deep learning.

● 19

Machine Learning with Python 220224 UK.indd 19

23/03/2022 11:52

Machine Learning with Python

Figure 3.1: Different areas of artificial intelligence. Machine learning as a subregion of artificial intelligence is increasingly becoming the central core of current research. Independent "learning" from extensive amounts of data replaces personnel-intensive and problem-specific, explicit programming. The processing of data with known correlations and the "learning" of structures through so-called training play a central role. Later, the extracted rules can also be applied to new data and initially unknown contexts. The new methods analyze known data, establish certain rules, and finally make decisions based on them. Based on various mathematical methods, data sets are divided into hierarchical levels. This can eventually lead to practical applications such as image recognition or categorization. With the help of Python, the complex mathematics of neural networks can be reduced to simple functions. This allows users to quickly implement practical projects and yields relevant results in a short period of time. So-called deep learning can in turn be considered a special field of machine-learning methods. The goal is to imitate human learning behavior as effectively as possible. The basis for this is the use of large amounts of data as "learning material". Deep-learning algorithms for neural networks currently deliver the best results in the area of image or pattern recognition, etc. The neural networks used for this purpose usually consist of several intermediate layers that serve as a connection between the input and output nodes.

● 20

Machine Learning with Python 220224 UK.indd 20

23/03/2022 11:52

Chapter 3 • Learning from "Big Data"

Figure 3.2: A neuronal network. The layers contain so-called artificial neurons. The function of these neutrons is emulated by mathematical operations (see Chapter 11). The aim of this development is to simulate the structure of the human brain with its network of neurons, axons, dendrites and synapses as efficiently as possible. Deep learning is characterized by the use of multiple layers in the neural networks. The input layer processes either the direct raw data, for example, the individual pixels of an image, or preprocessed or filtered data streams. The inner or "hidden" layers take on the further processing and data reduction. The output layer finally delivers the results. The typical structure of such a neural network is shown in Figure 3.2. The deep learning methods can also be used to model complex relationships. The large number of network levels also allows to map highly non-linear functions. Without this structure, even simple logical gate functions would be unsolvable, as can be seen for example with the XOR gate (see Chapter 2). In contrast, deep learning algorithms are able to solve comparatively complex tasks. Through multiple "training runs", the in-depth learning can be improved with each calculation step. It has thus developed into one of the central development drivers in the field of artificial intelligence [11]. The main differences between machine learning and deep learning are conveniently summarized in a table.

● 21

Machine Learning with Python 220224 UK.indd 21

23/03/2022 11:52

Machine Learning with Python

Learning data

Size of the data sets required for learning

Necessary Hardware structure

Required Training times

Interpretation of decisions

Machine Learning

structured

All Sizes

Simple

Short

Simple

Deep Learning

No special requirements

Extensive

Challenging

Days to weeks

Almost impossible

An important advantage of deep learning compared to general machine learning is the ability to process unstructured data. Real information such as images, sounds and videos etc. can be used here more or less directly as input data. Other machine learning algorithms, such as decision tree methods do not have this capability. If, for example, images are used as input data, complex and special program adjustments must always be carried out by real people. Examples of AI processes that work without neural networks are the so-called evolutionary algorithms. These programs are also based on a natural learning process. However, it is not the brain that is modeled, but the biological evolution. Here, the algorithms follow one another in different generations. According to Darwin's principle of "survival of the fittest", every possible solution is evaluated and the best version is selected in each case. The "mutated" variants thus lead to an optimized algorithm that is better adapted to the given problem than the previous one. The last column of the table indicates a possibly serious problem with AI algorithms. In the case of complex neural networks, it is usually no longer possible to understand exactly how decisions are made. In the case of massive wrong decisions by AI systems, this can lead to considerable legal or social problems. Ultimately, the acceptance of such decisions will depend to a large extent on their perceivability. AI applications that do not have a minimum level of transparency will therefore hardly be able to establish themselves in (vital) areas.

● 22

Machine Learning with Python 220224 UK.indd 22

23/03/2022 11:52

Chapter 4 • The Hardware Base

Chapter 4 • The Hardware Base Just a few years ago, the prevailing opinion was that machine learning or AI could only run on server farms or super computers. The high-performance machines that won chess games became "Jeopardy" masters or beat human GO specialists, shaped the public image of AI. However, with advances in hardware and software, the situation has changed dramatically. Even a mid-range PC now has several gigabytes of RAM and small single-board systems like a Raspberry Pi 4 are available with up to 8 GB of memory. Even on microcontrollers like the ESP32, ML applications are now performing amazingly. In addition, special chips such as the Kendrite K210 are increasingly seen in the market. The hardware structure of these chips is already specifically designed for neural structures. Several different systems will therefore be used in this book. In addition to the classic universal PC, both the Raspberry Pi and the MaixDuino will prove their capabilities in the individual projects. The following table provides an overview of the projects related to the hardware. This should not exclude porting a solution to a different hardware base. Unfortunately, it is relatively time-consuming, for example, to control peripheral devices directly with a PC, since it often lacks the appropriate interfaces. With the Raspberry Pi or the MaixDuino, on the other hand, LEDs or relays can easily be controlled via the existing input/output lines and pins. Speech recognition and synthesis, on the other hand, would also be feasible on a PC. However, the implementation in Python is much easier on a Raspberry Pi system. PC, Laptop

Raspberry Pi

Basics

X

X

Iris classification

X

X

Handwriting recognition

X

X

Clothes sorting

X

X

Sipeed MaixDuino

X

Speech recognition / synthesis

X

Object recognition

X

X

Face recognition

X

X

Home automation

X

X

Size (approx. cm)

30 × 40 × 15

9 × 6 × 1.5

7×6×1

Price (approx. €)

500

50

30

Power consumption (W)

50

10

5

● 23

Machine Learning with Python 220224 UK.indd 23

23/03/2022 11:52

Machine Learning with Python

Chapter 5 • The PC as Universal AI Machine A modern PC or laptop can do a good job in the field of machine learning. However, some requirements should be met for this. The following table provides an overview of the minimum requirements: CPU

Quadcore with at least 3 GHz

RAM

minimum 16 GB

HDD

1 TB

It is certainly possible to implement some entry-level projects with less powerful equipment. However, in this case the training times for neural networks, for example, become very tedious even for simpler applications.

5.1 The computer as a programming center The PC is not only used as a basis for getting started with ML projects. Rather, it is also required for other purposes. For example, the MaixDuino cannot be programmed directly. The PC is required here as a host computer which programs the Maix board via the USB interface. An active USB hub can do a good job here. It has the particular advantage in offering a certain protection for the USB port of the PC or laptop used. It is important that it is actually an active hub, meaning the device must have its own 5 V power supply via a separate power supply unit. This is the only way to ensure that the computer port is protected in the event of a short-circuit behind the hub.

Figure 5.1: An active USB hub. Of course, even an active hub does not offer absolute protection. However, it is very unlikely that a short-circuit will break through an active hub and reach the computer's USB port.

● 24

Machine Learning with Python 220224 UK.indd 24

23/03/2022 11:52

Chapter 5 • The PC as Universal AI Machine

If you want to use the Maix board without a PC, i.e., in a stand-alone mode, an external 5 V power supply unit is required too. Ideally, the hub's power supply unit can be used for this. After programming, the board can be used independently of an USB port. It is important for the power supply to provide sufficient current. About 2000 milliamps (2 A) nominal current should be the minimum. If the power supply still has a USB-micro connector, an adapter to USB-C is required (see Figure 5.2). Further information on stand-alone operation can be found in section 7.4.

Figure 5.2: A USB-C adapter.

Figure 5.3: A USB power supply. At this point, it should be mentioned that the MaixDuino has an integrated voltage regulator, just as the Arduino,. This means that the boards can also be supplied via any power supply with an output voltage of 6 V to 12 V. The unit must be able to supply a current of at least 2 A. For the connection to the respective board, the power supply unit must have a standard coaxial power connector.

● 25

Machine Learning with Python 220224 UK.indd 25

23/03/2022 11:52

Machine Learning with Python

In various applications, the Maix board runs somewhat more stable when it is operated via the coaxial power connector. Specifically, when an SD card is used and additional hardware is connected to the pins of the MaixDuino, the USB supply seems to reach its limits. Section 7.4 provides further information on this topic.

● 26

Machine Learning with Python 220224 UK.indd 26

23/03/2022 11:52

Chapter 6 • The Raspberry Pi

Chapter 6 • The Raspberry Pi The Raspberry Pi has become one of the most popular controller boards in recent years. Although it is often used in hardware-related projects, it can also be successfully used for machine-learning applications. In particular, the newer generation of the Raspberry Pi 4 already brings sufficient computing power for this. However, the available RAM size is of crucial importance. With a Pi 4 with 8 GB RAM, you are well-equipped. With the faster CPU, the new GPU, 4 K support, USB 3.0, USB-C, Bluetooth 5.0, Gigabit Ethernet, even more challenging projects can be implemented. The Raspberry Pi 4 thus sets another milestone in terms of performance and features. The Pi 4 is 3 times faster than its predecessor and offers significantly faster multimedia performance, which is particularly advantageous for image processing. Overall, the Pi 4 thus approaches the performance of an x86-based PC in many respects.

Figure 6.1: The Raspberry Pi 4. The following table summarizes the most important parameters of the Pi 4: Processor: Video

Audio: RAM WLAN: Bluetooth: LAN:

64-bit-Quad-Core - ARM Cortex-A72 (4× 1.5 GHz) Dual Display Support with resolutions up to 4 K via two Micro HDMI Ports Hardware Video Decoder with up to 4Kp60 2× Micro HDMI (up to 4Kp60 Support) 2 channel MIPI DSI-Port (Display) 2 channel MIPI CSI-Port (Camera) 4-wire Stereo Audio up to 8 GB RAM - LPDDR4 Dual Band 2.4 / 5 GHz 2.4 GHz and 5 GHz IEEE 802.11b/g/n/ac wireless LAN 5.0, BLE Gigabit-Ethernet

● 27

Machine Learning with Python 220224 UK.indd 27

23/03/2022 11:52

Machine Learning with Python

Interfaces: GPIO: SD-Card: Power Supply:

2× USB 3.0 2× USB 2.0 Standard 40 Pin GPIO-Header (compatible with older boards) microSD (for Operating System and Data Storage) 5 V / 3 A (via USB-C)

6.1 The Remote Desktop The Raspberry Pi has 4 USB ports and one or, in the case of the RPi 4, even two HDMI interfaces. Consequently, the mini-computer can be equipped with a keyboard, mouse and screen. Alternatively, it can also be controlled via the Windows Remote Desktop. This variant offers several advantages. The keyboard, mouse and screen can be used together with the PC. There is no need for separate equipment for the RPi. This is particularly advantageous in confined spaces. For example, you don't need an additional monitor or have to switch back and forth between the two systems. A remote desktop connection to the Raspberry Pi allows complete control of the RPi via LAN or WLAN. In principle, the Raspberry Pi can also be controlled almost exclusively via the ASCII console, but a graphical interface is usually advantageous, specifically for ML applications. The remote-desktop connection is already preinstalled on all recent Windows systems and is therefore ideal for controlling the RPi. So, if you want to control the RPi remotely, the remote desktop connection is a very efficient option, also in terms of data volume. On the RPi's side, only a single package is required. This can be downloaded by entering sudo apt-get install xrdp

All of the important settings are already predefined. You can therefore log on to the RPi immediately after completing the installation. On a Windows PC, the required program can be found as Remote Desktop Connection in the Start menu.

Figure 6.2: Start window of the Remote Desktop utility. In the start window, either the IP of the Pi or the name of the Raspberry (default: raspberrypi) is entered as the host name. After that, the login screen of the Pi is displayed.

● 28

Machine Learning with Python 220224 UK.indd 28

23/03/2022 11:52

Chapter 6 • The Raspberry Pi

Figure 6.3: Login window for the Raspberry Pi. Here you enter the login information (the same as via SSH, i.e., default computer name "pi" - default password "raspberry"). The Raspberry desktop then appears in a separate window.

Figure 6.4: The Raspberry Pi desktop appears in MS Windows.

● 29

Machine Learning with Python 220224 UK.indd 29

23/03/2022 11:52

Machine Learning with Python

The following keyboard shortcuts have proven useful for working with the Remote Desktop: • Ctrl+Alt+Break sets the Remote Desktop to full-screen mode and back to window mode. • Alt+Tab can be used to switch between individual applications in full screen mode. • Alt-Insert works in window mode similar to Alt-Tab. • Alt-PgDown and Alt-PgUp can also be used to switch between applications in window mode. In addition, the copy & paste function can be used across the PC and Raspberry systems.

6.2 Using smartphones and tablets as displays The remote desktop functionality is not restricted to MS Windows PCs. Remote Desktop apps can also be installed on Android smartphones or tablets. These devices can subsequently be used as wireless displays. Since there are no particularly high requirements for this application in terms of memory capacity or CPU power, even older Android-based devices can be used. Possibly, some older tablets or smartphones may find a useful application once again. The Remote Desktop App from Microsoft can easily be installed via the Play Store. After starting the app, the Raspberry, for example, can be called up via the + sign in the app. Figure 6.5 shows how an older tablet can be used as a compact screen for the Pi.

Figure 6.5: A tablet-PC acting as a wireless display (via WLAN ) for the Raspberry Pi.

6.3 FileZilla If you want to exchange data not only via copy & paste, but also transfer entire files between a PC and an RPi, then FileZilla is the right choice. This program is available free of charge as server and client software for file transfer via FTP and SFTP. With the FileZilla

● 30

Machine Learning with Python 220224 UK.indd 30

23/03/2022 11:52

Chapter 6 • The Raspberry Pi

client, the user can connect to an FTP/SFTP server and then upload and download files. The program can be downloaded from the Internet (see LINKS.txt in the download package for the book). After installing and starting the program, a connection can be established directly from the PC to the Raspberry Pi. The following entries are required for this: Server: User Name: Password: Port:

IP of the Raspberry Pi User Name of the Raspberry (Standard: pi) Password for user pi (Standard: raspberry) 22

Figure 6.6: FileZilla in action. Now you can exchange, copy or move files between the PC and the active Pi in the same way as with the Explorer in Windows.

6.4 Pimp my Pi Since the Raspberry does heavy work when training an extensive neural network or with complex image processing applications, you should ensure that the processor is not subjected to too much thermal stress. Immediate destruction is unlikely, as various internal protective mechanisms take effect. Nevertheless, an increased chip temperature always has a negative effect on the lifespan of the component in question. A first measure is to attach a heatsink to the main processor:

● 31

Machine Learning with Python 220224 UK.indd 31

23/03/2022 11:52

Machine Learning with Python

Figure 6.7: BroadCom-Processor with attached heatsink. However, active cooling with a fan is even more effective. This significantly reduces the processor temperature. It makes the Pi a real "number cruncher", not easily upset by a high computing load.

Figure 6.8: Raspberry Pi with cooling fan. With this simple measure, the CPU temperature can easily be reduced by 15oC or more (Figure 6.9).

● 32

Machine Learning with Python 220224 UK.indd 32

23/03/2022 11:52

Chapter 6 • The Raspberry Pi

Figure 6.9: RPi CPU temperature reduction with the help of a fan.

● 33

Machine Learning with Python 220224 UK.indd 33

23/03/2022 11:52

Machine Learning with Python

Chapter 7 • Sipeed Maix, aka "MaixDuino" With the MaixDuino, a low-cost system is available that supports the application of artificial intelligence algorithms in particular. With Python, it is no problem to get AI algorithms running on a PC or a workstation. However, single-board computers such as the MaixDuino also allow equipping small and inexpensive hardware systems with appropriate functions and processes [10,11]. In addition to selecting a system with sufficient computing power, there is always the question of which additional hardware components such as cameras, displays, microphones, actuators or sensors should be used for data acquisition. Available for around 30 euros, the MaixDuino (Figure 7.1) solves this problem in a convenient way. In addition to a camera and display interface, the board also provides several I/O pins. These are arranged in such a way that the connectors are compatible with the familiar Arduino UNO board. Thus, it is in principle possible to operate so-called Arduino shields with the MaixDuino as well. Not only the hardware is based on the Arduino system. There are also strong similarities on the software side. The MaixDuino can also be programmed via the Arduino IDE. However, for ML applications, it is advantageous to use Python as programming language.

Figure 7.1: The MaixDuino. Since the MaixDuino is currently less widespread than the Raspberry Pi, the following section will present the most important features of the "Maix" in more detail.

7.1 Small but mighty: the performance figures of the MaixDuino Among other well-known chips, the Kendryte K210 processor works on the MaixDuino board. Figure 7.2 shows the most important functional units of this chip.

● 34

Machine Learning with Python 220224 UK.indd 34

23/03/2022 11:52

Chapter 7 • Sipeed Maix, aka "MaixDuino"

Figure 7.2: Internal structure of the Kendrite K210. Two 64-bit processors (CPU) clocked at 400 megahertz form the core of the system. Instead of the familiar ARM architecture, the open-source ISA RISC-V version is applied here. Each core is also assigned an FPU (floating-point unit). The license fees saved in this way certainly contribute to the low price of the processor. The most interesting functional unit is the KPU with the CCN accelerator. KPU stands for Knowledge Processing Unit. This central function block is able to drastically accelerate the processing of mathematical operations that frequently occur in special neural networks (e.g., Convolutional Neural Networks). KPUs or "AI accelerators" were developed to speed up artificial intelligence applications. In particular, artificial neural networks, machine learning or computer-aided "vision" can work much more effectively with them. Typical applications can be found in robotics or IoT projects [12]. In addition, an audio processor (APU) ensures the rapid processing of audio signals. Other modules are also available, for example for fast Fourier transforms (FFT). This makes the board particularly well-equipped for image processing projects and audio analysis. In this way, intelligent applications in the AIoT area (Artificial Intelligence of Things) can be implemented extremely fast and cost-effectively. In addition, controller-specific units such as • • • •

GPIO UART, SPI, I2C Timer PWM

● 35

Machine Learning with Python 220224 UK.indd 35

23/03/2022 11:52

Machine Learning with Python

are available. The following overview summarizes the most important features of the MaixDuino: Main Processor: K210 RISC-V Dual Core 64bit, with FPU @ 400 MHz Neural Network Processor Flash: 16 MB RAM: 6 MB + 2 MB exclusively for the KPU Co-Controller: ESP32 Interfaces: Pinheaders compatible with Arduino UNO USB type C 24 LCD connector 24P Camera connector SD card slot Speaker Output Power Supply: USB Type-C or DC 6–12V (converted internally to 5 V / 1.2 A) Audio: Onboard bidirectional I2S digital output MEMS Microphone 3 W Speaker output (DAC + internal amplifier) WLAN: 2.4G 802.11.b/g/n Bluetooth: 4.2 Buttons: Reset and Boot Button Periphery: I2C, SPI, I2S, WDT, TIMER, RTC, UART, GPIO The board supports direct video output with a resolution of 320×240 pixels (QVGA) at a frame rate of 60 frames per second (fps) or VGA (640×480) with 30 fps. The MaixDuino is available as a single board or as a starter set (see also the Materials section at the end of the book). In addition to the board itself, the latter contains two other components: • a 2.4'' TFT display • a suitable camera module

Figure 7.3: MaixDuino Starter Kit. It is not possible to add more memory to the built-in 8 MB SRAM. The Kendryte K210 is primarily intended for IoT devices that (pre-)process image or sound signals. Up to eight microphones to be used as a microphone array can be connected directly. The digitized signals can then be processed with the FFT and KI units [10,11].

● 36

Machine Learning with Python 220224 UK.indd 36

23/03/2022 11:52

Chapter 7 • Sipeed Maix, aka "MaixDuino"

The AES and SHA units are primarily intended to run encrypted and signed firmware on the K210.

Figure 7.4: MaixDuino viewed from front.

Figure 7.5: MaixDuino viewed from rear.

7.2 A wealth of applications There are various application areas of the small board. The focus is of course on machine learning, image acquisition and all variations of neural network applications. Due to the high-performance audio processor, acoustic signal acquisition and recognition processes can also be implemented. The resulting applications range from simple projects to almost professional areas of application. The following list gives a first impression of the possibilities:

● 37

Machine Learning with Python 220224 UK.indd 37

23/03/2022 11:52

Machine Learning with Python

Smart home applications: • Cleaning robots • Intelligent speakers • Smart door openers Medical technology: • Medical image recognition • Automatic medical emergency alerts Industrial applications: • Intelligent sorting • Monitoring electrical devices Educational Applications: • Learning robots • Intelligent interactive platforms Agricultural Applications: • Monitoring plant growth • Detection of pests and plant diseases • Automated control of harvesting machines The use of neural networks and other methods from the field of machine learning or AI opens up completely new perspectives in the evaluation of image data. In contrast to conventional techniques, however, this requires a comparatively high computing capability and thus powerful hardware. Mobile applications or devices without internet access were therefore difficult or even impossible for a long time. In order to be able to use AI systems more universally and to increase their popularity, several specialized chip sets have been developed in recent years. The aim was to offer sufficient computing power even with a small size and low power consumption. In this case, sufficient computing power is to be understood as sufficient for the task. These chips are usually not really suitable for complex training procedures. The training of the neural networks should therefore take place on powerful computers. The finalized models can then be transferred to the smaller boards and used there in practical applications. Kendryte's K210 chip belongs to this new type of chip. With the two RISC cores and the specialized neural network unit, the computing power is quite impressive in comparison to other chips. This is all the truer when you consider that the K210 gets by with a low power consumption. This even brings battery operation into the realm of possibilities (see Section 7.4).

7.3 Initial start-up and functional test For an initial test, the MaixDuino can simply be connected to any USB port. Please note that a USB-C cable is required for this connection. As with all open boards, however, some safety instructions should be observed:

● 38

Machine Learning with Python 220224 UK.indd 38

23/03/2022 11:52

Chapter 7 • Sipeed Maix, aka "MaixDuino"

• The board should never be operated on a conductive surface. Due to the open solder joints, short circuits could occur, which could destroy the MaixDuino. • If available, an active USB hub should be connected between the PC and the board. In the event of a short circuit, this can in most cases at least prevent damage to the USB port of the PC, since the hub prevents over currents on the PC side. In addition, a hub usually provides more power than a PC USB port. This prevents undesired voltage drops in the supply of the MaixDuino, especially when using laptops. When the cable is connected, the red power LED should light up. If not, the board must be disconnected from the port immediately. In this way, it may be possible to prevent a potential short-circuit from causing extensive damage. Only then should troubleshooting be started. Helpful hints for troubleshooting can be found in the corresponding chapter at the end of the book. If the power LED lights up properly, you can connect the display to the MaixDuino. To do this, you should first disconnect the Maix once again from the supply voltage (USB or coaxial power connector). Then the ribbon cable of the display is connected to the socket labeled 'LCD'. After reconnecting the USB cable, the MaixDuino reports with a red start screen and the white characters "Welcome to MaixPy". The version of the firmware loaded may also be on the display. If there is a newer or different firmware on the Pi, the start screen may look a little different.

Figure 7.6: Start-Up screen on the MaixDuino display. The Maix can be powered via the USB-C socket as well as via a coaxial power connector. A direct voltage of 6 to 12 volts is required for the coaxial power connector. The source should be able to deliver a current of up to 2 amps. Although the Maix can also be operated with just one supply at a time, it has proven to be beneficial to use both sources in parallel. This leads to a significantly better operational reliability, particularly in the case of computationally intensive applications.

● 39

Machine Learning with Python 220224 UK.indd 39

23/03/2022 11:52

Machine Learning with Python

7.4 Power supply and stand-alone operation In many cases, the MaixDuino and the Raspberry Pi get their supply voltages via the USB port. However, when a project is completed, it is often desirable to operate the boards independently of a PC or laptop. In this case you can either use external power supplies, or (rechargeable) batteries. Since the Raspberry does not have an integrated voltage regulator, powering the board is only possible via the USB-C connector. For this purpose, special power supplies are available that deliver a voltage of 5.1 V at a maximum current of around 3 A. The slightly higher voltage ensures that the Pi works reliably even with extended loads. With standard 5-V power supplies, sometimes the undervoltage symbol appears (a lightning bolt in the upper right segment of the screen). In this case unexpected errors or even system crashes can occur. With the MaixDuino, two alternative power supply variants are available. Either the USB-C or the coaxial power connector can be used as a voltage input. In the second case, a direct voltage between 6 and 12 volts with a maximum capacity of approx. 1.5 A is required. The internal regulator of the MaixDuino generates the lower voltages required for safe operation. As already mentioned, in certain situations it is advisable to use both sources at the same time. Particularly when operating with an SD card or when various external loads such as LEDs or sensors are connected to the Maix, problems occur more frequently when using the USB supply alone. If you want to operate your MaixDuino completely independently of the mains, common USB powerbanks offer a good solution. These allow the boards to be operated mobile or in remote locations for many hours.

Figure 7.7: MaixDuino with a USB powerbank connected.

● 40

Machine Learning with Python 220224 UK.indd 40

23/03/2022 11:52

Chapter 8 • Programming and Development Environments

Chapter 8 • Programming and Development Environments Since Python has developed into a quasi-standard in the field of programming languages for machine learning and AI, various Integrated Development Environments (IDEs) have emerged over time. Each of these variants has its own specific advantages and disadvantages. The most important IDEs for Python programming are: • • • •

Thonny MaixPy IDE Anaconda Spyder

Thonny is especially popular because it comes installed on the Raspberry Pi by default. This made the IDE the classic programming tool in the maker scene. With MaixPy, an IDE specially tailored to the MaixDuino is available. This allows the specific performance features of this board to be used most efficiently. In particular, the video output of this IDE can be used. Anaconda and Spyder are the classic tools when it comes to implementing machine learning projects in Python. They offer installation packages for all common operating systems, ranging from Windows, Linux and MacOS to Raspberry Pi OS. The individual versions are described in more detail in the following sections. It also explains what advantages and disadvantages they offer for the respective applications and systems. After working through the chapters, it should no longer be a problem for the user to select the optimal IDE for a particular project.

8.1 Thonny — a Python IDE for beginners and intermediates Thonny is the standard IDE for Python and MicroPython programming. This IDE enjoys continuous updating and ongoing development, and it is therefore quite future-proof. Thonny is also available for all common operating systems. It is even preinstalled as standard on the current Raspberry Pi OS. In general, the installation is quite easy, so no problems should occur with the installation process. In this book, Thonny 3.3.10 in the version for PCs under Windows 10 was used. In principle, future versions should be compatible. However, if unexpected problems occur, consider reverting to this version. The appropriate download package can be found on the Internet (see LINKS.txt). When the download is finished, the installation file can be started. Now you just have to follow the wizard until the installation process is complete and the Thonny IDE can be opened.

● 41

Machine Learning with Python 220224 UK.indd 41

23/03/2022 11:52

Machine Learning with Python

Figure 8.1: The Thonny IDE after launching. Now, a Python-enabled board like the MaixDuino can be connected to the computer. To test the installation, Thonny must be configured for the MicroPython interpreter. In addition, the correct board must be selected. The following steps are necessary for this: 1. Via Run

Select Interpreter the following window will open :

Figure 8.2: The Options window in Thonny.

2. MicroPython (ESP32) must be selected in the first selection window. 3. The COM interface of the board must be entered under Port or WebREPL. The Maix provides two serial interfaces. Usually, Thonny has to be connected to the COM port with the lower PORT number. If in doubt, both ports must be tested (see Figure 8.3).

● 42

Machine Learning with Python 220224 UK.indd 42

23/03/2022 11:52

Chapter 8 • Programming and Development Environments

Figure 8.3: MaixDuino's COM ports. The Thonny IDE is now connected to the board and the prompt ">>>" appears in the Shell window. Alternatively, the option Try automatic detection can be selected. However, this does not work reliably with all boards. If the command help() is entered in the Shell, a welcome message as well as some information is given: Welcome to MicroPython on the Sipeed Maix! For generic online docs please visit https://maixpy.sipeed.com Official website : http://www.sipeed.com

Control commands: CTRL-A

-- on a blank line, enter raw REPL mode

CTRL-B

-- on a blank line, enter normal REPL mode

CTRL-C

-- interrupt a running program

CTRL-D

-- on a blank line, do a soft reset of the board

CTRL-E

-- on a blank line, enter paste mode

For further help on a specific object, type help(obj) For a list of available modules, type help('modules')

The installation of Thonny is now successfully completed and the board can be controlled via Shell instructions. The MaixDuino is usually delivered with a preinstalled MicroPython interpreter. If not, the interpreter can easily be installed later on. Details concerning this task can be found in Section 8.7.

● 43

Machine Learning with Python 220224 UK.indd 43

23/03/2022 11:52

Machine Learning with Python

Now, the MaixDuino is connected to the PC and the first ML projects can be started. First, however, you should familiarize yourself more closely with the Thonny IDE. In the next few sections, working with Thonny to program the Maix will be explained in more detail.

8.2 Thonny as a universal IDE for RPi and MaixDuino There are several different sections in the Thonny IDE, including the editor and the MicroPython shell or terminal: • Code is created and edited in the editor area. Multiple files can be opened, with a new tab available for each file. • In the MicroPython shell, commands that are to be executed immediately are entered. The terminal also provides information about the status of a running program displays errors related to the upload, syntax errors, etc. There are also other useful tabs available. These can be configured in the view menu. The Variables tab, in particular, can often be very useful. It shows all variables of a program and their current values. In order to become familiar with writing programs and executing code on the MaixDuino, a script can be developed that makes the integrated LED on the Maix flash. The MaixDuino has a multi-color (Red-Green-Blue or RGB) LED on its component side. This can be used for simple test applications. First, a main.py file is created on the board for this purpose: 1. When Thonny starts for the first time, the editor displays an untitled file. This file is saved as main.py. To do this, the file is saved via file

save as

as main.py on the Board ("Micro Python Device") itself. 2. Now a tab called "main.py" is available. 3. The following code is entered here: from Maix import GPIO from fpioa_manager import fm from board import board_info import time fm.register(14, fm.fpioa.GPIO0)

# 12: green, 13: red, 14: blue

led=GPIO(GPIO.GPIO0, GPIO.OUT) for i in range(10): led.value(1)

● 44

Machine Learning with Python 220224 UK.indd 44

23/03/2022 11:52

Chapter 8 • Programming and Development Environments

time.sleep(.1) led.value(0) time.sleep(.1)

The code is available in the download package for this book as blink_10.py and can be transferred via copy & paste. Further on it will be explained how files can be copied directly from the PC to the controller. Via the green arrow or Run

Run current script

or using the F5 function key, the code is transferred to the controller. Now the RGB LED on the Maix should flash exactly 10 times in quick succession (Figure.8.4).

Figure 8.4: The RGB LED of the MaixDuino in action.

8.3 Working with files To create a file with a unique name using the Thonny IDE on the Maix, the following steps are required: • Create the new file proper. • Save the file with the desired name, for example, blink_10.py. Via the menu File

New

the file can be opened as a new tab. You can then use the menu File

Save as

● 45

Machine Learning with Python 220224 UK.indd 45

23/03/2022 11:52

Machine Learning with Python

to save the file on the MaixDuino under its name (blink_10.py). In the query (Figure 8.5), the second option (MicroPython device) must be selected.

Figure 8.5: Query when saving. The file is now uploaded to the board and appears in the Files sub-window. From there it can now be started using the green arrow or the F5 key. Alternatively, it is also possible to load files from the chip onto the computer by selecting This Computer in the query. Further commands for deleting or renaming files etc. can also be found in the File menu. A complete file system can be accommodated on the MaixDuino. Therefore, when programming in MicroPython, several programs can be stored on the board. These can then be processed directly by the interpreter that is also available on the system. The file system can be managed directly with Thonny. Similar to many other programming tools, the development environment contains the following components (Figure 8.1): 1. 2. 3. 4.

Folders and files Edit MicroPython Shell / Terminal Tools

In the left-hand sub-window (Folders and files) the files currently stored on the board are visible in the device folder ("MicroPython device"). As soon as the board is connected to Thonny via a serial connection, all stored files are displayed when opening the device folder. Directly after installing the Python interpreter, only a main.py file is visible here. In order to execute the application code, a boot.py file should also be created. To do this, again follow: File

New

and a new file called untitled is created. This file can be saved locally on the board under the name main.py using the Disk symbol in the Tools window.

● 46

Machine Learning with Python 220224 UK.indd 46

23/03/2022 11:52

Chapter 8 • Programming and Development Environments

Figure 8.6: Creating a new boot.py file next to main.py The following two files are now available in the device folder: boot.py: main.py:

executed every time the board is rebooted main script for the application code

Below the device folder, the SD folder can be found. This is intended for accessing files that are stored on an µSD card. If such a card is inserted in the corresponding slot of the MaixDuino (Figure 7.4); the files on it appear in the SD folder. In the editor area, the code for the .py application program is created. The editor opens a new tab for each file. The section below the editor area is the MicroPython Shell/Terminal. All commands entered here are immediately executed by the interpreter. In addition, the terminal also displays information about the status of a running program. Any syntax errors in the current program or error messages during uploading, etc., also appear here. The icons in the Tools area at the top left-hand side of the main window can be used to execute tasks quickly and directly. The buttons have the following functions: -

New File: Open File: Save File: Run (green circle with white arrow): Stop:

Creates a new file in the editor Opens a file on the computer Saves a file Runs code Stops the code execution. This corresponds to entering CTRL + C in the shell

In addition, there are other symbols available for debugging which are not necessarily needed in the context of this book. To quickly test the connection between PC and board, a print command can be entered.

● 47

Machine Learning with Python 220224 UK.indd 47

23/03/2022 11:52

Machine Learning with Python

This will immediately show whether the communication is working correctly: >>> print ('Test') If the answer Test >>> appears, the Maix is ready for use.

8.4 Thonny on the Raspberry Pi Thonny is also available by default on the Raspberry Pi. It is launched via Start (Raspberry)

Development

Thonny Python IDE

Figure 8.7: Starting of Thonny on the Raspberry Pi. The appearance of Thonny on the Raspberry is nearly identical to the PC version (see Figure 8.8).

● 48

Machine Learning with Python 220224 UK.indd 48

23/03/2022 11:52

Chapter 8 • Programming and Development Environments

Figure 8.8: Thonny running on the Raspberry Pi. If you want to use Python locally, i.e., on the Pi itself, you have to select via Run

Select interpreter

Thonny Options

this option: The same interpreter which runs Thonny as shown in Figure 8.9.

Figure 8.9: Selection of the interpreter.

However, it is also possible to program the MaixDuino via the Raspberry Pi! To do this, the interpreter option on the RPi MicroPython (ESP32)

● 49

Machine Learning with Python 220224 UK.indd 49

23/03/2022 11:52

Machine Learning with Python

must be selected in the Thonny IDE on the RPi. The port must be set to: Sipeed-Debug (/dev/ttyUSB0)

Figure 8.10: Selection of the MaixDuino ports. Subsequently, the Maix can be programmed via the Raspberry Pi in the same way as from a PC (see Chapter 8.2 ff.). In this case, no Windows PC is required at all.

Figure 8.11: Raspberry Pi controls MaixDuino!

8.5 Tips for troubleshooting the Thonny IDE Below, some error messages of the Thonny IDE are discussed. The corresponding problems are usually relatively easy to solve: • In many cases, restarting the MaixDuino with the integrated Boot/RST button solves the issue.

● 50

Machine Learning with Python 220224 UK.indd 50

23/03/2022 11:52

Chapter 8 • Programming and Development Environments

• Within the Thonny IDE, pressing the Stop / Restart Backend button (or CTRL-F2) can often solve communication problems between PC and board. Otherwise, the following hints may help: Error 1: No connection to the board established. In this case, the following error messages are printed: ========================= RESTART ========================= Unable to connect to COM14 Error: could not open port 'COM14': FileNotFoundError(2, 'The system cannot find the file specified.', None, 2) or: ========================= RESTART ========================= Could not connect to REPL. Make sure your device has suitable firmware and is not in boot-loader mode! Disconnecting. or: ========================= RESTART ========================= Lost connection to the device (EOF).

Here, it is often helpful to interrupt and then re-establish the USB connection to the module. You should also check whether the correct serial port is selected under: Run

Select Interpreter

This error could also indicate that the serial port is already in use by another program, such as a serial terminal or the Arduino IDE. In that case, make sure that all programs potentially establishing serial communication are closed. The Thonny IDE should then be restarted. Error 2: The Thonny IDE does not respond or issues an internal error After closing and reopening the active window, you should be able to continue working as usual. If crashes occur repeatedly, the entire Thonny IDE should be restarted. Error 3: The Thonny IDE no longer responds to the Stop / Restart Backend button After pressing the button called Stop / Restart Backend, you should wait a few seconds. The MaixDuino needs time to restart and to re-establish the serial communication with Thonny. If the Stop button is clicked in rapid succession several times, the board does not have enough time to restart properly. This may cause the Thonny IDE to crash. Error 4: Problems when restarting the MaixDuino, executing a new script, or opening the serial interface.

● 51

Machine Learning with Python 220224 UK.indd 51

23/03/2022 11:52

Machine Learning with Python

If the error message: Brownout detector was triggered

appears or continual reboots occur or an information similar to: ets Jun 8 2016 00:22:57 rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) configsip: 0, SPIWP:0xee clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00 mode:DIO, clock div:2 load:0x3fff0018,len:4 load:0x3fff001c,len:4732 load:0x40078000,len:7496 load:0x40080400,len:5512

is displayed, this may indicate a hardware problem. Often, the cause is one of the following issues: • • • • •

poor quality USB cable; USB cable too long; the board is defective (e.g., a bad solder joint); faulty computer USB connection; the computer's USB port does not supply enough power.

In these cases, using a high-quality USB cable that is as short as possible will help. Switching to a different USB socket on the PC can also be helpful. For laptops, an active USB hub with its own external power supply should be used. In this way, you are no longer dependent on the capacity of the laptop's internal USB power supply. If the problems persist or other strange error messages appear, it is advisable to update the board with the latest version of the MicroPython firmware. This will at least eliminate the possibility that errors already fixed cause further difficulties. Error 5: No connection between PC and MaixDuino possible. Sometimes the Maix is too busy to establish a USB connection. Clicking the Stop/ Restart Backend button several times may help. However, this repeated clicking should be done at reasonable time intervals (see above). If a script is executed that uses Wi-Fi or the energy-saving mode or executes several tasks in parallel, it is recommended to try three or four times to establish the communication. If this is still not possible, the ESP should be flashed again with the most current MicroPython firmware.

● 52

Machine Learning with Python 220224 UK.indd 52

23/03/2022 11:52

Chapter 8 • Programming and Development Environments

8.6 The MaixPy IDE Especially for working with the MaixDuino, the MaixPy IDE can also be used. The IDE is also quite usable for MicroPython syntax. The IDE is installed via the Sipeed web page (see LINKS.txt). The use of the IDE enables you to: • • • • •

edit scripts on the PC; upload programs on the Maix; execute scripts directly on the MaixDuino; save and manage files on the controller board; view camera images on the computer in real time (Figure 8.12).

Figure 8.12: MaixPy IDE in action. The last topic is a significant advantage of the MaixPy IDE over other systems such as Thonny or Jupyter. In addition to the live camera image, a three-part color live histogram of the RGB color space is also displayed. Other histogram versions can also be selected via the small triangle symbol next to the diagram header. You should be aware that the use of this IDE version consumes some resources for the compression and transmission of the camera image, which is at the expense of the computing power of the controller board. After installation, the IDE can be started for a first test run. Using Tools

Select Board

the correct controller board (Sipeed MaixDuino; Figure 8.13) is selected.

● 53

Machine Learning with Python 220224 UK.indd 53

23/03/2022 11:52

Machine Learning with Python

Figure 8.13: Selection of the MaixDuino board. To connect to the MaixPy board, click on the chain symbol at the bottom on the left side. Then the correct COM interface must be selected. If in doubt, all options must be tested here again. After the connection has been successfully established, the color of the Connect button changes from green to red. Below the Connect button is the Execute button. This executes the Python file currently opened in the editor. Clicking the Run button again stops the running program. Options for uploading files can be found in the corresponding drop-down menu. Please note that only one serial port may be open at a time. It must therefore be ensured that all previous connections have been closed before opening new ones. If the connection fails, try to update the firmware or IDE. Not all IDE versions are compatible with all firmware variants. However, the following combination • MaixPy IDE 0.2.5 • maixpy_v0.5.0_125_gd4bdb25.bin was tested successfully. Other combinations should be checked before use. Any errors are displayed in a message box. Unfortunately, the error information is not always complete. Further information on the possible sources of errors can be found in the terminal output. For this purpose, the IDE can be closed and a serial terminal can be started. The terminal printout will then often contain further hints for troubleshooting. Although the MaixPy IDE is hardly used in the context of this book, it can certainly be utilized for control and test purposes. In particular, if the display or camera do not work correctly, this IDE version can provide valuable error information. The functions such as the live image analysis system or the color space diagram can also be helpful in various ML projects.

8.7 A MicroPython interpreter for MaixDuino Usually, the MaixDuino comes with a preinstalled Python interpreter. However, it may be necessary to load a new interpreter for several reasons:

● 54

Machine Learning with Python 220224 UK.indd 54

23/03/2022 11:52

Chapter 8 • Programming and Development Environments

• the Maix has been programmed with the Arduino IDE in the meantime. In this case, the Python interpreter will be deleted. • the interpreter is out of date and requires an update. • the memory content of the board has been damaged due to voltage dips or improper shutdown. In addition, you can never completely exclude that the hardware is delivered without a preinstalled interpreter. For uploading the interpreter, a so-called kflash GUI is required. The tool can be downloaded from the internet (see LINKS.txt). After loading and unpacking the file: kflash_gui.exe can be started (Figure 8.14).

Figure 8.14: Kflash GUI interface. The firmware itself can be found on the Sipeed website or on GitHub (see LINKS.txt). The file names look like maixpy_v0.6.2_53_g4fe0efb56.bin or similar. The file is now loaded into the kflash GUI via "Open File". After selecting the correct parameters: Board: Burn To: Serial Port:

Sipeed MaixDuino Flash COM-Ports of the Maix (e.g., 13)

● 55

Machine Learning with Python 220224 UK.indd 55

23/03/2022 11:52

Machine Learning with Python

Baud rate: Speed mode:

115200 Slow mode (recommended)

The firmware can be sent to the MaixDuino via the Download button. Again, make sure that the COM port is not occupied by another program (Arduino, MaixPy-IDE, etc.).

Figure 8.15: Download of the firmware to the MaixDuino. Downloading and burning the firmware can take several minutes. Afterwards, the Maix is available again for the new Python projects. For the upload speed (Speed mode), the option Slow mode is recommended. This works fine in most cases. Fast mode should only be used if highly reliable connections and short and high-quality USB cables are available.

8.8 The Flash tool in action The kflash tool can not only be used for uploading the Python interpreter. Rather, it is a universal tool that allows to load any kind of data into the flash memory of the MaixDuino. When working with ML projects, in many cases various models are available for testing. These models usually contain fully trained neural network data. A large number of readymade models are available for the MaixDuino. These models can be available in two different variants: 1. as .kfpkg-File 2. as .kmodel In the first case, the .kfpkg file (the e.g., face_model_at_0x300000.kfpkg) can be loaded directly to a specific HEX address using the Flash tool. A kmodel (e.g., facedetect.kmodel), on the other hand, can be written to the Maix's SD card via Thonny.

● 56

Machine Learning with Python 220224 UK.indd 56

23/03/2022 11:52

Chapter 8 • Programming and Development Environments

Of course, the correct version must be considered in the Python program. In the first case, the model is read from the hex address, for example, via: task = kpu.load(0x200000)

In the second case, the file, including the path, must be specified, for example, using: task = kpu.load("/sd/MNIST.kmodel")

A .kfpkg file is just a .zip archive file whose extension has been renamed to "kfpkg". The files can be opened with a ZIP tool. You get two files: 1. the kmodel itself; 2. a .json-file. The JSON part tells the kflash_gui to which address the kmodel should be written. For this reason, the file name of a kfpkg usually contains the hex address (e.g., 0x300000) to which it is written. In this way it is possible to load two (or more) files, for example the Python interpreter and a model, without overlapping memory areas.

8.9 Machine Learning and interactive Python The term "interactive" goes back to the Latin expression inter agere. The verb agere means "to act". The word component "inter" points to something in between. In this sense, one can say that an interactive shell is between the acting user and the operating system. An interactive system therefore waits for commands from the user and immediately carries them out. It returns straightaway the result of the execution. After that, the Shell waits for the next input. Python offers a convenient command line interface, also called the Python Interactive Shell. Python thus has two basic modes: • Script Mode; • Interactive mode. In script mode, complete .py files are executed in the interpreter. The interactive mode, on the other hand, offers a command-line shell that provides immediate feedback for each instruction. The interactive mode is thus well suited for testing new program constructions. Interactive Python or "IPython" offers a comprehensive architecture for interactive computing: • a powerful interactive shell; • support for data visualization and use of GUI tool kits.

● 57

Machine Learning with Python 220224 UK.indd 57

23/03/2022 11:52

Machine Learning with Python

Over time, various IPython variants had evolved. These were finally combined into a development system called Jupyter a few years ago. Jupyter (formerly: IPython Notebook) is an open-source project in which Markdown text and executable Python source code can be easily combined. Using Markdown thus provides a simple commenting language that can be easily formatted without further conversion and is easy to read. To work with Jupyter notebooks, an Anaconda environment can be used. Once this is installed, code cells can be created and executed immediately. In addition to working on the local computer (e. g. under Windows), a connection can also be made to a remote Jupyter server. Then code cells can be processed on this system and Python files or Jupyter notebooks can be exported. As a result, the MaixDuino can also be addressed via Jupyter. The Jupyter system is also ideally suited for use on the Raspberry Pi itself. The Pi 4 in particular provides sufficient resources to allow fluent working. This makes IPython and Jupyter a universal system for a wide variety of hardware versions, including those used in this book. Therefore, it is not a huge surprise, that Jupyter notebooks have become a universal and extremely popular tool for machine learning and AI programming. Anaconda, on the other hand, is a fully equipped data science system that allows easy installation of an Ipython IDE. This development environment will therefore be examined in more detail in the next sections.

8.10 Anaconda The Anaconda Navigator provides a graphical user interface which is easy to install under Windows as well as Linux. Anaconda is available as a free download for private applications (link in download package). It is possible to download the latest version of Anaconda for the current Python version. The appropriate operating system (Windows, MacOs or Linux, each in 32 or 64 bit version) must be selected. The installation of Anaconda usually runs smoothly if you follow the instructions on the download page exactly. For use in this book, the following options should be selected: 1. For all users. 2. Add Anaconda to PATH environment. 3. Register Anaconda as default Python. Experience has shown that using these options, there are few problems if, for example, the MicroPython kernel is to be integrated into Jupyter later. However, one should be aware that Anaconda is a very complex and advanced application. This means that you can never guarantee that everything will always run according to a Plug & Play scheme. Anaconda is therefore not recommended for absolute beginners. But even the more experienced user will have to fix errors in some cases. Search engines or relevant forums usually do a good job here. If Anaconda has been installed successfully, the program can be started

● 58

Machine Learning with Python 220224 UK.indd 58

23/03/2022 11:52

Chapter 8 • Programming and Development Environments

via the Desktop icon or the entry in the Start menu. It should be noted that the program launch can take some time even with a high-performance system. After completing the start procedure, the navigator appears:

Figure 8.16:Anaconda Navigator At first glance, this offers an almost unmanageable variety of possible applications. The most important ones in the further course of the book include Jupyter and Spyder. The spelling with "y", is generally believed to go back to the "y" in Python in both cases. Jupyter is one of the most popular applications for data analysis, machine learning and neural networks. It provides a so-called IPython notebook ("interactive Python", see above). There, each code block can be executed separately. Additionally, diagrams can be displayed in each block. Subsequently, further code can be processed and the data can be displayed in a new diagram and so on. Furthermore, functions such as % timeit are also provided, which allow to check the runtime requirements of the code. Spyder, on the other hand, is an integrated development environment (IDE) for Python, similar to Thonny. It is mainly used to develop complete Python programs. Often, the Jupyter notebook is preferred for analyzing data in the field of AI applications, testing experimental code or evaluating different code variations. For classical Python programming, on the other hand, Spyder is preferred. The "Environments" selection in the Anaconda Navigator can be used to create special isolated programming environments. Libraries installed here are only available within this environment. In this way, different versions of a library can be used in different environ-

● 59

Machine Learning with Python 220224 UK.indd 59

23/03/2022 11:52

Machine Learning with Python

ments without interfering with each other. Further information on this topic can be found in section 10.10.

Figure 8.17: Environments in Anaconda New environments can be created with the + Create icon. For example, it is best to create a new environment with the name 'tf' for tensorflow. The desired libraries can then be installed via Search. For new libraries, the selection must be set from Installed to All. You can then return to the main window (Home). Using the selection Applications On, the desired environment can then be selected. After an initial installation ("install"), Jupyter can then be started ("launch"). If a library is not available in a later project, it can be easily installed at any time via Environments.

8.11 Jupyter Jupyter notebook is a very powerful tool for the interactive development of data-based projects. In the field of AI and Machine Learning it has established itself as an important development tool. This chapter shows how to set up and use Jupyter for data science projects on a PC or laptop computer. The "notebook" integrates code and output into a single document. This makes it easy to combine graphics, text, mathematical equations, etc. The result is therefore a single document which - executes code; - displays output.

● 60

Machine Learning with Python 220224 UK.indd 60

23/03/2022 11:52

Chapter 8 • Programming and Development Environments

Additionally, - explanations; - formulas; - diagrams can be inserted and displayed. Working with Python thus becomes more transparent, understandable and reproducible. In addition, the joint development and use of code and programs in a work group is greatly simplified. Notebooks have been used for several years in many companies around the world. They allow for improved communication and rapid sharing of results. Although it is possible to use many different programming languages in Jupyter Notebooks, this chapter focuses on Python because this language is particularly well suited for use in a notebook. The easiest way for beginners to get started with Jupyter Notebooks is to install Anaconda (see previous chapter). Anaconda provides a full-featured machine learning environment without the need to manage multiple installations. Operating system-specific installation problems are also largely avoided with Anaconda.

8.12 Installation and Start-Up Advanced users who have already installed Python and prefer to manage their packages manually can also install Jupyter via pip: pip3 install jupyter This variant is also often used in Linux systems. For small systems like the Raspberry Pi, ANACONDA is often not used and Jupyter is installed directly. On Windows, Jupyter can be started using the shortcut that was added to the start menu via Anaconda. This opens a new tab in the standard web browser (see figure 8.18)

Figure 8.18: Jupyter Notebook after Start-Up

● 61

Machine Learning with Python 220224 UK.indd 61

23/03/2022 11:52

Machine Learning with Python

This is a so-called notebook dashboard that was specially developed for the management of individual notebooks. It serves as a launchpad for editing and creating notebooks. The dashboard can only be used to access the files and sub-folders that are visible in the Jupyter start directory. When Jupyter Notebook is open in the browser, the dashboard is displayed in the URL bar as http://localhost:8888/tree This indicates that the content is provided by the local computer. It shows that Notebooks are web applications. Jupyter starts a local Python server to deal with the apps to the web browser. This makes them essentially platform-independent and allows easy sharing on the web. To create the first notebook, select "New" in the drop-down button at the top right and then "Python 3" (see figure above). This opens the first Jupyter notebook in a new tab. A new file "Untitled.ipynb" appears in the browser. A green bordered text field indicates that the notebook is actively running. Each notebook uses its own tab so that several notebooks can be opened at the same time. The .ipynb file (interactive Python Jupyter notebook) contains a text file describing the contents of a notebook in the so-called JSON format. Each cell, including graphs, comments or formulas etc. is stored in it, together with the associated metadata. When working with Jupyter, the terms "cells" and "kernel" play a central role: - A kernel is a "calculating machine" that executes the code contained in the notebook. - A cell is a container for text to be displayed in the notebook or for code to be executed by the notebook's kernel. In a new notebook (Figure 8.19), the field with the green border is called a cell. There are two main cell types: - Code cells contain program instructions to be executed by the kernel. When the code is executed, the notebook displays the output under the code cell that generated it. - A Markdown cell contains formatted text and just displays its output directly when the Markdown cell is executed. Jupyter can also be tested with the classic hello world example. The input print('Hello World!')

● 62

Machine Learning with Python 220224 UK.indd 62

23/03/2022 11:52

Chapter 8 • Programming and Development Environments

into the first cell and clicking the 'Run' button or the key combination Ctrl + Enter should give the following result:

Figure 8.19: "Hello World" in Jupyter If the cell has been successfully executed, the output is displayed directly below and the label to the left of it changes from In [ ] to In [1]. The output of a code cell is also part of the document. The difference between code cells and Markdown cells can be seen in the fact that code cells have the In [ ] label on the left, whereas Markdown cells do not. The "In" part of the label stands for "Input", while the number indicates, in which order the cell was executed on the kernel - in this case, this cell was the first to be executed. If the cell is executed again, In [1] changes to In [2], as the cell was now the second to be executed on the kernel. The following code: import time time.sleep(5)

does not produce any output but takes five seconds to execute. Jupyter indicates that the cell is currently active by displaying In [*] on the left. In a Jupyter notebook there is always an 'active' cell, highlighted with a border whose color indicates the current mode: - a green outline means that the cell is in "edit mode". - a blue outline indicates the "command mode" Cells in command mode can be executed with "Ctrl + Enter". There are also a number of other commands that can be applied to cells with a blue outline. Various keyboard shortcuts are available for this purpose. These keyboard shortcuts allow a quick cell-based workflow. Some important examples are listed below:

● 63

Machine Learning with Python 220224 UK.indd 63

23/03/2022 11:52

Machine Learning with Python

- Esc or Enter switches between edit and command mode - In command mode: ◦ Scrolling is possible using "up" and "down" keys. ◦ by pressing A or B, a new cell is inserted above or below the active cell. ◦ M converts the active cell into a Markdown cell ◦ Y converts the active cell to a code cell ◦ D + D (i.e., twice D) deletes the active cell ◦ Z undoes the deletion of cells - Ctrl+Shift+"-" divides the active cell at the cursor in edit mode With "Markdown", Jupyter provides an easy-to-learn markup language for formatting simple text. The syntax corresponds to the HTML tags. In the context of this book, however, it will hardly be used, as comprehensive formal formatting is mainly required for scientific publications or presentations.

8.13 Using MicroPython Kernels in Jupyter To interact with a MicroPython enabled controller, a special kernel must be installed. If Python 3 is already installed on the host computer, the following repository can be loaded via the shell: git clone https://github.com/goatchurchprime/jupyter_microp...

Subsequently the library is installed via the corresponding shell command: pip3 install jupyter_micropython_kernel

python -m jupyter_micropython_kernel.install

After (re-)starting Jupyter Notebook, a new MicroPython USB kernel is available now:

Figure 8.20: MicroPython USB Kernel in Jupyter

● 64

Machine Learning with Python 220224 UK.indd 64

23/03/2022 11:52

Chapter 8 • Programming and Development Environments

8.14 Communication setup to the MaixDuino Now you can communicate via Jupyter with an external board, such as the MaixDuino. To establish a corresponding connection to the controller, you can display all available ports using %serialconnect

in the first Jupyter cell. Then select the correct port and set the baud rate. Usually, 115200 baud works with without problems: %serialconnect to --port=COM14 --baud=115200

Now you can write directly on the screen of the MaixDuino via a notebook. import lcd lcd.init() lcd.draw_string(100, 100, "Hello MaixDuino!", lcd.WHITE, lcd.BLACK)

In the notebook, the entire process looks like this:

Figure 8.21: MaixDuino communicates with Jupyter. After starting the cell, the message appears on the display of the MaixDuino.

● 65

Machine Learning with Python 220224 UK.indd 65

23/03/2022 11:52

Machine Learning with Python

8.15 Kernels When program statements are executed in a cell, the associated code is processed in a kernel. Each output of the kernel is returned to the corresponding cell. The status of the kernel remains the same in this way since it refers to the entire document and not to individual cells. If, for example, libraries have been imported or variables declared in a cell, then these are also available in other cells. The following example illustrates this with the definition of a function: def pythago(x,y): return x*x + y*y Once the above cell has been executed, pythago() can be used in any other cell: pythago(3,4) 25

This also works regardless of the order of the cells within the notebook. As soon as a cell has been executed, all variables declared in it and all imported libraries are available in all other cells. Jupyter offers the possibility to change the kernel. This option was already used when activating the MicroPython kernel in the last section. There are several different options for changing the kernel. By selecting the Python version for a new notebook, the kernel to be used is chosen automatically. Kernels are available for different versions of Python and also for over one hundred other languages, including Java, C and even FORTRAN. So, it is also possible for example, to install a MicroPython kernel in Jupyter in addition to the classic Python3 kernel and use it to program external controllers such as the ESP32, an ESP-Eye or the MaixDuino.

8.16 Working with Notebooks Before a new project is started, the current notebook should be given a meaningful name. To do this, click on the name "Untitled" and choose a suitable new name. Saving is done by clicking on the disk symbol. Closing the notebook tab in the browser is different from closing a document in a traditional application such as Word or Excel. The notebook's kernel continues to run in the background and must be shut down separately before it is actually closed. This has the advantage that the kernel does not have to be restarted if a tab or the browser is accidentally closed. Only when the kernel has been shut down, the tab can be completely closed without any problems. The easiest way to do this is by choosing File

Close and Halt

● 66

Machine Learning with Python 220224 UK.indd 66

23/03/2022 11:52

Chapter 8 • Programming and Development Environments

from the notebook menu. Alternatively, the kernel can also be terminated using Kernel

Shutdown

It is generally recommended that you save your current work regularly. By pressing Cntr+S the notebook is saved and a so-called checkpoint. With each new notebook, a checkpoint file is created along with the notebook file itself. The checkpoint file is located in a hidden sub-directory and has the suffix .ipynb_checkpoints. By default, Jupyter automatically saves the current notebook to this checkpoint file every 120 seconds without changing the primary notebook file. Via Ctrl+S, both the notebook and the checkpoint files are updated. Therefore, unsaved work can be restored with the checkpoint in the event of an unexpected problem. Using the menu File

Revert to Checkpoint

it is possible to return to the checkpoint.

8.17 All libraries available? Specifically in the area of machine learning, neural networks and artificial intelligence, many highly powerful "libs" have been developed in recent years. Depending on your needs, these can be reloaded using the Anaconda Navigator. For a quick overview of which library is available in which version, Library_test_1V0.ipynb (see download package) can be used in a Jupyter notebook. If the corresponding libs are loaded, the following information should be displayed after executing the first cell:

Figure 8.22: Library-Test in Jupyter.

● 67

Machine Learning with Python 220224 UK.indd 67

23/03/2022 11:52

Machine Learning with Python

You can see that in this special case, TensorFlow version 2.3.0 is available. The libs "skimage" or "seaborn," on the other hand, are not installed, but could of course be loaded via Anaconda, if necessary. For standard Python, the program Library_test_1V0.py can be used to check the availability of libraries. In Thonny missing libraries can be installed via: Tools

Manage plug-ins…

For the Raspberry Pi, special installation procedures are usually available for each Lib. Further details on the individual libraries can be found in Chapter 10.

8.18 Using Spyder for Python Programming Spyder is an integrated development environment (IDE) for scientific computing and programming. Here, too, the focus is on the Python programming language. Spyder has, among other features: • an editor for writing code; • a console for evaluating and displaying results; • a variable manager. Spyder has rightfully gained its place in the scientific community. With non-professional users, on the other hand, it tends to remain in the shade. The competition from the widespread Thonny is obviously too strong here. Due to the fact that Thonny is available as standard on the Raspberry Pi, it has become increasingly popular with hobby users and "makers". Nevertheless, Spyder will also be briefly presented here. Which of the two IDEs the user ultimately prefers is a matter of taste. Both IDEs have their specific advantages and disadvantages. The easiest way to start Spyder is to use the Anaconda Navigator. Similar to Jupyter, it is available after the initial installation.

● 68

Machine Learning with Python 220224 UK.indd 68

23/03/2022 11:52

Chapter 8 • Programming and Development Environments

Figure 8.23: Starting Spyder via Anaconda. A typical Spyder window looks like this:

Figure 8.24: Spyder in action. The editor can be found on the left-hand side. Here the Python code is entered as usual, e.g.: print ("Hello World")

● 69

Machine Learning with Python 220224 UK.indd 69

23/03/2022 11:52

Machine Learning with Python

The program is started with the green arrow and delivers its result. The interactive IPython console can be found on the lower right side. The code from the editor sends its output including complex graphics etc. directly to the Ipython console. But interactive inputs too, like print (3 * 8) deliver the correct output to the console via Shift+Enter. The variable manager at the top right shows the names and content of all available variables. If, for example, a variable a = 42 was declared in the code, its name, type, size and value are displayed in the variable explorer. This is specifically useful when a large number of variables have been declared in the program. Alternatively, a file manager can also be displayed in the variable manager window. The selection is carried out via the tab selection below. In addition, Spyder provides the usual tools and a few other interesting features which shall not be discussed here any further.

8.19 Who's programming who? Although this book refers to just three hardware systems: • PC (Windows10, possibly Linux or UBUNTU); • Raspberry Pi (Raspberry Pi OS); • MaixDuino (MaixPy); and four programming environments: • • • •

MaixPy; Thonny; Jupyter; Spyder,

the number of programming options quickly becomes confusing. If you take into account that the Raspberry Pi, for example, can also be controlled via the remote desktop, an extensive number of variations comes up. For example, the MaixDuino can be programmed via a Raspberry Pi using a Thonny IDE. The Raspberry on the other hand, could be controlled via WLAN using a PC via a remote desktop session... The following figure shows a summary of all these versions without claiming to be exhaustive.

● 70

Machine Learning with Python 220224 UK.indd 70

23/03/2022 11:52

Chapter 8 • Programming and Development Environments

Figure 8.25: Possible connections between ML systems.

● 71

Machine Learning with Python 220224 UK.indd 71

23/03/2022 11:52

Machine Learning with Python

Chapter 9 • Python in a Nutshell For several years now, Python is established as one of the most widely used programming languages. Due to its simple structure and ease of learning, Python has established itself among professionals as well as makers [8,9]. A complete introduction to Python or MicroPython is beyond the scope of this book. Moreover, extensive literature is readily available on this topic (see Bibliography). Nevertheless, the most important basics will be recapitulated here. The focus is, of course, on the commands and instructions that are of particular importance in the machine learning area. In particular, the specifics for the MaixDuino board will be discussed. The development of MicroPython made the programming of microcontroller systems simple and straightforward. This makes the programming language well suited for mainframe systems and also for the world of embedded systems. Python programs can be found in the hobby sector as well as in education or scientific applications. Also, commercial developers are increasingly working with Python. In the IT industry, numerous market leaders such as Google or Amazon have used Python for their software developments for a long time [1]. In addition, freely available modules and libraries such as MatPlotLib, NumPy, SciKit or SciPy provide extensive possibilities. Thus, the applications range from simple LED control and scientific data analysis to machine learning and artificial intelligence. MicroPython was developed as a lean version of Python 3. Because the language is interpreted, it is generally slower than compiled systems. MicroPython was designed to work as efficiently as possible on small, embedded systems. This makes it best suited to run on microcontrollers, which have much slower clock speeds and significantly less memory than typical personal computers. This variant is therefore well suited for small systems such as the MaixDuino. The disadvantage that classic Python programming for low-level controllers is difficult to implement is eliminated with the "micro version" of Python. Following the standard, this version is also strongly based on Python 3 in terms of syntax. Virtual machines and extensive libraries ensure that controller programming becomes as simple as software development for classic computers. A comparison of the two most popular programming languages in the microcontroller environment shows that Python is increasingly preferred over C / C ++. In the rankings of the most popular programming languages, Python can now usually be found at the top position and its competitor C /C++ is increasingly relegated to the back ranks. The reason for this development is mainly due to the following advantages of Python: • the simple language structure makes Python very beginner-friendly; • various internet forums support the programmer with tutorials and sample codes; • extensive libraries are available.

● 72

Machine Learning with Python 220224 UK.indd 72

23/03/2022 11:52

Chapter 9 • Python in a Nutshell

Beginners can usually find solutions to their problems very quickly in various forums. With other programming languages, this form of mutual support is less distinct. In C, programming is carried out using control registers, pointers and other structures that are often difficult to understand. The firmware for the target controller must be programmed, compiled and finally transferred to the controller chip using a programming device. MicroPython integrates all of these steps. With a simple mouse click, users can control low-level hardware such as LEDs, displays or motors. The acquisition of analog voltage values or working with SD cards becomes very simple with the available libraries. Integrated garbage collection and a dynamic allocation process enable efficient memory management in Python. There is therefore hardly any need to use pointers or similar constructs that are usually difficult for beginners to penetrate. The often cryptic C symbols such as x ++, etc. as well as the complex variable declaration represent a hurdle for beginners that should not be underestimated. Python is known for its simplicity and the excellent legibility of the code. Since MicroPython was developed as a "light version" for microcontroller applications, not all libraries and functions available in standard Python are supported. Nevertheless, you can easily switch to the micro version if you are already familiar with Python. There are just a few syntactic structures or instructions that are not available or applicable in MicroPython. Python is interpreted. This means that the original program code is processed directly by the target processor. Compilation is not required. Python therefore offers the possibility of executing code on a wide variety of systems. All you need is an up-to-date interpreter. One of the greatest advantages of Python code is its broad compatibility and portability. Python programs can be executed on classic computers running Windows, MacOS or Linux as well as on small single-board systems such as the Raspberry Pi, MaixDuino or similar micro systems. Above all, the use on the well-known "RPi" has also contributed to the increasing popularity of Python. A universal syntax ensures that programs can be developed easily on small as well as large systems. This makes Python extremely scalable. The possible encapsulation of data and program code in clear, reusable modules, i.e., "objects", makes Python an object-oriented programming language. Nowadays, C ++ is generally used primarily in low-level programming. Classical Python variants have so far not been well suited for this. This gap is now closed with MicroPython. In addition to client applications, C ++ also includes powerful server applications as well as device drivers and embedded driver components. The area of application extends from system software to application programming. Since Python is still a relatively young programming language compared to C, it has not yet found universal use in all sub-areas of information technology.

● 73

Machine Learning with Python 220224 UK.indd 73

23/03/2022 11:52

Machine Learning with Python

However, it turns out that Python has its own advantages in many areas. The main disadvantage of Python, on the other hand, is certainly its comparatively low processing speed. This is where compiled languages like C can clearly demonstrate their power. Fast control loops or real-time systems, vehicle controls and safety queries are much easier and safer to implement in C. Since these areas of application hardly play a role for non-professional users, the speed disadvantage is usually not really significant. Ultimately, it is also the use of an interpreter that enables Python to be used as an interactive language. Environments like Jupyter could be built on this. The possibility of controlling the MaixDuino interactively via Jupiter shows the potential of this process. This is how Python has achieved its specific importance in the cutting-edge field of artificial intelligence (AI). With extensive libraries such as NumPi, SciPi etc., and distributions like Anaconda, Python has become by far the most popular programming language in this area. All doors are therefore open to the experienced Python user. From hardware-related controller programming to AI applications — with Python there are no limits to intuition and creativity. Therefore, the basics of Python and MicroPython are explained and demonstrated in this chapter.

9.1 Comments make your life easier In any programming language, explanatory comments are important. This is the only way to know what a certain program section does. Using comments allows you and others to understand code even months or years later without having to delve into all the details again and again. It is not necessary to comment on each line of a given program. Experienced coders should be able to understand individual instructions without comment. A single-line comment is only recommended for special constructs or unusual or innovative lines of code. On the other hand, in the case of subroutines or entire enclosed program sections, a brief explanation of how they work should be included. Simple comments will be introduced with the #-sign. They start with # and end with the end of the line: >>> print("Hello MaixDuino")

# this is a comment

Hello MaixDuino

Multi-line comments can also be marked with a triple " sign (" " "). The same string ends the comment: """ first comment line second line of comment """

● 74

Machine Learning with Python 220224 UK.indd 74

23/03/2022 11:52

Chapter 9 • Python in a Nutshell

In practice it can look like this: ''' This is a multi-line comment. Prints hello world. ''' print("hello world")

Alternatively, in Thonny the comment function can also be used. It allows several lines to be marked as comments with the #-symbol at the same time.

Figure 9.1: Comment function in Thonny. The comment function is also very suitable for "commenting out" certain parts of code. When testing a more extensive code and certain sections should be excluded from execution on a trial basis, these can be commented out: # print("Hello Peter") print("Hello Tom") The interpreter then ignores the marked lines. A time-consuming deletion and later reinsertion of the code lines is therefore not necessary. The output can be converted from "Hello Tom" to "Hello Peter" by simply moving the comment character: print("Hello Peter") # print("Hello Tom") Comments also help beginners in particular to gain a better understanding of the code structure. From a technical point of view, the interpreter ignores all comments, i.e., they have no influence whatsoever on the code execution. A typical application is to switch models on the MaixDuino:

● 75

Machine Learning with Python 220224 UK.indd 75

23/03/2022 11:52

Machine Learning with Python

task = kpu.load(0x500000) # task=kpu.load("/sd/20class.kmodel")

Here, the model is loaded directly from memory. Using # task = kpu.load(0x500000) task=kpu.load("/sd/20class.kmodel")

on the other hand, the Maix accesses the model on the SD card in the /sd directory. This simple option is used in various programs (see, for example, Section 13.7).

9.2 The print() statement Any Information can be sent to the terminal using the print() instruction. The command can be executed directly in the terminal:

Figure 9.2: Print command used in the console section. On the other hand, it is used in programs to print text-based information.

9.3 Output to the display If you do not want to send information to the console but to the display of the MaixDuino, two libraries must be integrated via the import instruction. Further details can be found in Chapter 10. A typical program for printing to the display looks like this: import lcd import image img = image.Image() img.draw_string(80, 80, "hello", scale=3) img.draw_string(80, 120, "maixpy", scale=3) lcd.display(img)

In this way, text is placed on the display using graphics functions:

● 76

Machine Learning with Python 220224 UK.indd 76

23/03/2022 11:52

Chapter 9 • Python in a Nutshell

img.draw_string(x, y, "TEXT", scale=s)

The values (x, y) defines the starting position of the text in pixels. The original position (0,0) is the top left corner of the screen. The text size can be set using "scale". All values starting from scale = 1 are permitted. For s = 22, one letter already fills the height of the entire screen.

Figure 9.3: Printing to the MaixDuino display. The following figure shows the result on the Maix:

Figure 9.4: Text Output on the MaixDuino Display A practical application of this screen output can be found for example in the alarm system for person detection in Section 15.5.

● 77

Machine Learning with Python 220224 UK.indd 77

23/03/2022 11:52

Machine Learning with Python

9.4 Indentations and Blocks Python distinguishes between different program blocks by indentation. It is not necessary to use curly brackets ("{}") or the likes. This is one of the most important differences to most other languages like C, Pascal, Basic etc. The advantage of this method is that you are practically forced to exert a degree of code structure. if True: # block 01 print ("True") else: # block 02 print ("False")

The number of indented spaces is variable, but the same block must always maintain the same number of indented spaces. Otherwise, an error message is issued: if True: print ("Answer") print ("True") else: print ("Answer") print ("False")

# The different indentation will lead to a runtime error.

In addition, the indentation depth must remain the same throughout the whole code. Otherwise, the following message appears: >>> %Run -c $EDITOR_CONTENT Traceback (most recent call last): File "", line 6 IndentationError: unexpected indent

Blocks are formed in the same way in conditions and loops.

9.5 Time Control and Sleep The sleep instruction was already used In the example of the flashing LED (Section 8.2). It is included in the "time" module and accessed via import time

The instruction time.sleep(seconds)

● 78

Machine Learning with Python 220224 UK.indd 78

23/03/2022 11:52

Chapter 9 • Python in a Nutshell

sets a fixed delay time in seconds. Alternatively, only the sleep command itself can be imported: from time import sleep

In this case, the statement can be reduced to sleep(seconds)

Even so the command can also be used for fractions of a second, it is advisable to use the instruction time.sleep_ms(Milliseconds)

for very short delay times, since this version shows an improved precision for small time intervals. The disadvantage of these functions is that they are blocking. This means that the controller cannot perform any other tasks during the waiting time as it is busy counting processor cycles. The use of interrupts or other programming techniques is an alternative. Since these methods are hardly needed in the context of the book, they will not be pursued further here. If necessary, appropriate explanations can be found in the literature (see Bibliography). The following two routines are available for calculating time queries: time.ticks_ms()

# or

time_ticks()

time.ticks_us()

They indicate the current system runtime in milliseconds or microseconds. An output like the one in Figure 9.5 means that the MaixDuino has been running for 977309 ms = 977.309 s = 16 minutes, 17 seconds, and 309 milliseconds

Figure 9.5: System runtime. A classic application of ticks() is the measurement of code runtime. For example, the following code can be used to show that math operations take a certain amount of time:

● 79

Machine Learning with Python 220224 UK.indd 79

23/03/2022 11:52

Machine Learning with Python

# runtime.py import time import math

while(True): start = time.ticks_us() time.sleep(1) # x = math.exp(math.sin(22.5)) stop = time.ticks_us() print(stop-start)

If the comment symbol in front if the calculation is removed, the displayed processing time increases from approx. 100.015 µs to 100.025 µs. The calculation time for the formula is therefore approx. 10 µs. In addition to the time library, "math" was also imported here in order to be able to evaluate somewhat more complex mathematical functions.

9.6 Hardware under control: digital inputs and outputs In contrast to programming a PC or laptop, MicroPython often focuses on direct access to hardware units. In particular, the control of individual input/output pins (I/O pins or GPIO for General Purpose Input Output) is quite important [8,9]. In this way, ML or AI applications can also be combined with IoT and hardware control. This leads to completely new areas of applications, specifically in robotics. But also, a door lock controlled via face recognition can be implemented in this way (see Section 15.11). A simple application for this features was demonstrated in Section 8.2. Here, a complete control program for the RGB-LED of the MaixDuino is presented (see RGB_LED_tst.py in the download package): from Maix import GPIO from fpioa_manager import fm from board import board_info import time LED=12

# 12: green, 13: red, 14: blue

fm.register(LED, fm.fpioa.GPIO0) led=GPIO(GPIO.GPIO0, GPIO.OUT) led.value(1) time.sleep(1) led.value(0) time.sleep(1) fm.unregister(LED)

● 80

Machine Learning with Python 220224 UK.indd 80

23/03/2022 11:52

Chapter 9 • Python in a Nutshell

LED=13

# 12: green, 13: red, 14: blue

fm.register(LED, fm.fpioa.GPIO0) led=GPIO(GPIO.GPIO0, GPIO.OUT) led.value(1) time.sleep(1) led.value(0) time.sleep(1) fm.unregister(LED) LED=14

# 12: green, 13: red, 14: blue

fm.register(LED, fm.fpioa.GPIO0) led=GPIO(GPIO.GPIO0, GPIO.OUT) led.value(1) time.sleep(1) led.value(0) time.sleep(1) fm.unregister(LED)

After starting the program, first the green, then the red and finally the blue LED lights up for one second each. On a Raspberry Pi, an LED control program could look like this: import time import RPi.GPIO as GPIO GPIO.setmode(GPIO.BCM) GPIO.setup(23, GPIO.OUT) try: while 1: GPIO.output(23, GPIO.HIGH) time.sleep(0.5) GPIO.output(23, GPIO.LOW) time.sleep(0.5) except KeyboardInterrupt: pass GPIO.cleanup() print("bye...")

The time library must be integrated here also. Via

● 81

Machine Learning with Python 220224 UK.indd 81

23/03/2022 11:52

Machine Learning with Python

import RPi.GPIO as GPIO GPIO.setmode(GPIO.BCM) GPIO.setup(23, GPIO.OUT)

the ports of the Rasp Pi become available. In this case port 23 is used. Since the Raspberry does not have an integrated test LED, one must be connected externally:

Figure 9.6: Hardware control using a Raspberry Pi.

9.7 For vital values: variables and constants Variables are one of the most important elements of any programming language. In Python, the declaration of variables is particularly easy since it is not necessary to specify the data type of the variable during the assignment. This is a major difference to other languages. There variables must always be explicitly initialized with a certain type (e.g., int a = ...). Variables can also be used in the console: >>> a = 17 >>> print(a) 17

The following rules apply to the assignment of variable names: • The variable name may only contain numbers, letters and underscores. • The first character of a variable must be a letter or an underscore. • Variable names are case-sensitive.

● 82

Machine Learning with Python 220224 UK.indd 82

23/03/2022 11:52

Chapter 9 • Python in a Nutshell

Variables can be assigned values of different types. Types in MicroPython include numbers, strings, lists, dictionaries, tuples, etc. Type() can be used to check the data type of variables and constants, for example: >>> a = 17 >>> print(type(a))

>>> a, b, c, d = 17, 1.5, True, 5+7j >>> print(type(a), type(b), type(c), type(d))

Numbers like 10, 100 or strings like "Hello World!" are constants. MicroPython offers the keyword const, which is used to mark a fixed value: from micropython import const a = const(33) print(a)

9.8 Numbers and variable types MicroPython supports the following number types: • • • •

integer (int); floating-point (float); boolean (bool); complex.

A number object is created whenever a value is specified. With del() the objects can be deleted again. In the console it looks like this:

Figure 9.7: Deleting variables. It is also possible to assign several variables in one line by just assigning values separated by commas, like so: a, b = 1, 2

● 83

Machine Learning with Python 220224 UK.indd 83

23/03/2022 11:52

Machine Learning with Python

The results of divisions (/) always return a floating point value, e.g.: 12/3 = 4.0

The integer format corresponds to whole numbers like in mathematics, including signs: 1, 100, -8080, 0 ,……

Float corresponds to the real numbers. Scientific notation using exponential expressions can also be used: >>> a = 1.234e3 >>> a > 1234.0

An important difference between integer and float is, that integer values are stored exactly. In the case of float numbers, on the other hand, numeric rounding occurs. Variables of the Boolean type can only have the two values True and False. Here, too, a distinction is made between upper- and lower-case. The variable type complex consists of a real and an imaginary part in the form of a + bj or complex(a, b). Both the real part and the imaginary value b can be floating-point numbers. This also allows calculations using complex numbers: >>> a = 1j >>> b = 1j >>> a*b >>>(-1+0j)

Binary (0b ...) and hexadecimal (0x ...) numbers can also be used: >>> a=

0b1010

>>> a >>> 10 >>> a = 0xff >>> a >>> 255

9.9 Converting number types Numbers and variables can also be converted from one type to another: The following instructions are used for this purpose: • int(x): Converts x to an integer; • float(x): Converts x to a floating point number; • complex(x): Converts x into a complex number. ● 84

Machine Learning with Python 220224 UK.indd 84

23/03/2022 11:52

Chapter 9 • Python in a Nutshell

In the first case, no rounding is used, but the fractional part is cut off. >>> a = 2.0 >>> print(int(a)) # Convert float to int 2

>>> x, y = 4.3, 7 >>> c = complex(x, y) >>> print(c) (4.3+7j)

9.10 Arrays as a basis for neural networks Arrays represent a collection of elements of the same type. They usually play an important role in ML projects. The advantage of arrays is that they can hold large amounts of data in a structured form. For example, if a series of measurements is to be created, the individual results can be saved in an array. In machine learning projects, almost all data is stored in arrays. For example, images captured by a camera are converted into arrays before further processing, the weights of neural networks are stored in arrays, etc. Individual values can be easily accessed within an array. In order to be able to work with arrays, the corresponding module must be loaded, e.g.: import array as arr

Subsequently, an array can already be created: a = arr.array('d', [1.1, 3.5, 4.5])

For the number type used in the array, either "d" for float numbers (decimal) or "I" for integers can be specified. The following code creates an array as a first step. Next, the values are printed individually to the console: # array.py import array as arr a = arr.array('d', [1.1, 3.5, 4.5]) for n in range(len(a)): print(a[n])

The output shows the individual array values:

● 85

Machine Learning with Python 220224 UK.indd 85

23/03/2022 11:52

Machine Learning with Python .

>>> %Run array.py 1.1 3.5 4.5

9.11 Operators Within Python or MicroPython, the arithmetic operators have its meanings familiar from mathematics: +, - Addition *, / multiplication

c=a+b c=a*b

subtraction Division

c=a–b c=a/b

In addition, there are some more special operators: // % **

Integer division 11 // 3 ==> 3 Modulo (remainder of division) 11 % 3 ==> 2 Exponent 2 ** 3 ==> 8

The logical operators not, and, and or are also available in (Micro)Python: a

not a

True

False

False

True

a

b

a and b

a or b

True

True

True

True

True

False

False

True

False

True

False

True

False

False

False

False

The bit operators are represented in Python by the following statements: & | ^ ~

bitwise bitwise bitwise bitwise

AND OR XOR NOT

The shift operators can also be used: >

Shift bits to the left Shift bits to the right

● 86

Machine Learning with Python 220224 UK.indd 86

23/03/2022 11:52

Chapter 9 • Python in a Nutshell

The following examples illustrate the application: >>> a = 0b1010 >>> b = 0b1100 >>> bin(a&b) '0b1000' >>> a = 0b1010 >>> bin(a>> a = 5 >>> b = 7 >>> a < b True >>> a > b False

or # a = True a = False if a == True: print("a!") else: print("NOT a!")

9.12 Conditions, branches and loops No programming language can do without control structures. Branch and loop instructions are available in (Micro)Python for this purpose. In programming, many identical or similar processes often have to be repeated. Loops offer a much more elegant method for this purpose than simply repeating statements in code [8,9].

● 87

Machine Learning with Python 220224 UK.indd 87

23/03/2022 11:52

Machine Learning with Python

Branch instructions are used when decisions are required. These allow a program to react correctly to different situations. In contrast to other programming languages, Python offers a very extensive functionality for loops. In the following, however, only the basic loop instructions are explained briefly. Using loops allows blocks of code to be executed multiple times. The execution continues until a specified condition is met. There are two types of loops available in MicroPython: • while loops • for loops If you want to output numbers from 1 to 10 to the console, the following while loop can be used: number=1 while number> Risk of frost

If Temperature = 17 is set instead, the statement >>> No danger of frost

is printed. The result after the keyword if supplies a Boolean expression. This can be either true or false. If the expression is true, the statement immediately after the if line is executed. These statements must be indented so that it is clear which statements belong to the if block. Here, too, it should be noted that the Boolean expressions in the if line must be terminated with a colon. The else statements are only executed if the if-query is false.

9.13 Trial and error — try and except During the execution of a program, exceptions or errors can occur. Exception handling allows a certain treatment of these situations. The code can continue to be executed without the program being aborted. Some high-level programming languages such as Java or Ruby, but (Micro)Python too, have built-in mechanisms to allow exception handling. In Python, code containing a risk of an exception is embedded in a try/except block. For example, the following program should calculate reciprocal values: # reciprocal.py while True: n = int(input("Please enter a number: ")) print("Number was: ", n) print("Reciprocal is: ", 1/n)

As long as values other than zero are entered, the code works fine. If a zero is entered, however, the program terminates with an error message:

● 89

Machine Learning with Python 220224 UK.indd 89

23/03/2022 11:52

Machine Learning with Python

Figure 9.8: Program abort with an error message. Using the try/except construction, the error can be caught: # try_ecxept.py while True: n = int(input("Please enter a number: ")) print("Number was: ", n) try: print("Reciprocal is: ", 1/n) except: print("error")

If zero is entered now, an error is reported, but the program still continues to run and is available for new entries:

Figure 9.9: An error is intercepted. This option is often used in ML projects. If a certain neural network element does not deliver a reasonable value, this does not lead to a program crash. Instead, a mere error message is printed. If acceptable values are delivered again, the system can continue to work without interruption.

● 90

Machine Learning with Python 220224 UK.indd 90

23/03/2022 11:52

Chapter 10 • Useful Assistants: Libraries!

Chapter 10 • Useful Assistants: Libraries! Python relies on its libraries. This is certainly true for general programming tasks, but even more so for ML applications. Some simple "libs" like time or math were already used in the previous chapter. If simple libraries could be created by experienced users on their own in some cases, this would be inefficient at best for complex ML libs. Often, ML libs include C routines, as these provide very fast and efficient code. The Python "wrappers" in this case ensure that the number-crunching work can conveniently be used by Python programmers. Figure 10.1 shows the most important libraries for ML applications. These are closely related and partly build on each other. Python itself forms the foundation of the construction. Since the libs were often developed by different teams, some libraries contain routines that perform almost the same tasks as other functions in different libraries. It is up to the user's experience to select the best routines and functions in each case.

Figure 10.1: The most important libraries. A certain problem with the use of the library system arises from the fact that not every version of a Lib is compatible with all versions of… another library. In a research area that is developing as quickly as machine learning, it can hardly be avoided that newly-developed libraries are not completely compatible with older versions. So-called virtual environments offer a certain way to avoid complications. They allow to create a kind of isolated laboratory in which the required version of a library can be installed without interference with external libs. This concept is explained in more detail in Section 10.10. The following sections introduce the most important functions of the individual libs. The examples can be evaluated on a PC under Windows or on a Raspberry Pi. Either Thonny or Jupyter can be used as the programming environment.

● 91

Machine Learning with Python 220224 UK.indd 91

23/03/2022 11:52

Machine Learning with Python

10.1 MatPlotLib as a graphics artist Most people are visually-oriented, so a single graph is often preferred to extensive lists of data or numerical tables. In addition, relationships can usually be captured much faster if they are presented in a graphical form. MatPlotLib is a classical Python lib which facilitates the development of graphical representations. Together with Python, MatPlotLib is also enjoying increasing popularity. In combination with NumPy and SciPy, it is at least as good as, if not superior, to classical mathematics programs. In addition, MatPlotLib is available free of charge; it is open source and can be programmed in an object-oriented manner. Using MatPlotLib, diagrams and representations can be created in various forms and formats. The quality that can be achieved in this way easily approaches the requirements for scientific publications. Even beginners can quickly achieve considerable success with MatPlotLib. With just a few lines of code, you can create simple x/y plots, histograms, power spectra, bar charts, error charts, scatter plots and so on. The easiest way to integrate MatPlotLib on a PC is via Anaconda. For the Raspberry Pi, a specific routine is available: pip3 install matplotlib

The installation takes only a few minutes. After that, the lib can be used for a wide variety of applications. A MatPlotLib graphic consists of the following components: Figure:

This refers to the entire current figure. It can contain one or several diagrams. A "Figure" is to be understood as a kind of canvas that contains individual diagrams.

Axes:

A figure can contain several axes. Each axis has - a title; - an X-label; - a Y-label; - optionally, a Z-label (for 3-D representations).

Axis:

Axis objects are used to provide the axes with scales and diagram boundaries.

Artist:

Contains text, line-, 2-D- or 3-D-objects, etc.

One of the most important modules of MatPlotLib is Pyplot. It provides simple functions such as lines, circles and pixel images. Text can be displayed with a few lines of programming, as the following example shows (see matplotlib_demo.py or matplotlib_demo.ipynb in the download package):

● 92

Machine Learning with Python 220224 UK.indd 92

23/03/2022 11:52

Chapter 10 • Useful Assistants: Libraries!

import matplotlib.pyplot as plt import numpy as np x = np.linspace(0, 10, 100) plt.plot(x, x*x, label='linear') plt.legend() plt.show()

Using Jupyter, this generates the following plot:

Figure 10.2: MatPlotLib graphics. By using subplot(), several diagrams can be inserted into one Figure. In Figure 10.3, four subplots were created with this method. Subplot() uses three arguments: • nrows; • ncols; • index. They specify the number of rows, the number of columns and the index number of the subplot. In the example below, arguments (221) and (2,2,2) etc. are therefore passed to the subplot() method. With subtitle(), subplots headings are defined. The code snippet from numpy import e, pi, sin, exp, cos import matplotlib.pyplot as plt def f(t): return sin(2*pi*t) def g(t): return cos(2*pi*t)

● 93

Machine Learning with Python 220224 UK.indd 93

23/03/2022 11:52

Machine Learning with Python

fig = plt.figure(figsize=(18, 12)) t = np.arange(-2.0, 2.0, 0.01) sub1 = fig.add_subplot(221) sub1.set_title('221: function f') sub1.plot(t, f(t)) sub2 = fig.add_subplot(222) sub2.set_title('222: function g') sub2.plot(t, g(t)) t = np.arange(-2.0, 2.0, 0.01) sub3 = fig.add_subplot(223) sub3.set_title('223: g(t)*f(t)') sub3.plot(t, f(t)*g(t)) t = np.arange(-0.2, 0.2, 0.001) sub4 = fig.add_subplot(224) sub4.set_title('224: detail of g') # sub4.set_xticks([-0.2, -0.1, 0, 0.1, 0.2]) # sub4.set_yticks([-0.15, -0.1, 0, 0.1, 0.15]) sub4.plot(t, g(t)) plt.plot(t, g(t)) plt.tight_layout() plt.show()

provides the following quadruple plot:

Figure 10.3: Quadruple plot.

● 94

Machine Learning with Python 220224 UK.indd 94

23/03/2022 11:52

Chapter 10 • Useful Assistants: Libraries!

The presentation of 3-D graphics is also possible. When developing ML algorithms, scatter diagrams are useful tools to represent data. The following code snippet: from mpl_toolkits.mplot3d import Axes3D def randrange(n, vmin, vmax): return (vmax - vmin)*np.random.rand(n) + vmin fig = plt.figure(figsize=(18, 12)) ax = fig.add_subplot(111, projection='3d') n = 100 for c, m, zlow, zhigh in [('r', 'o', -50, -30), ('b', '^', -20, -5)]: xs = randrange(n, 23, 32) ys = randrange(n, 0, 100) zs = randrange(n, zlow, zhigh) ax.scatter(xs, ys, zs, c=c, marker=m) ax.set_xlabel('X Label') ax.set_ylabel('Y Label') ax.set_zlabel('Z Label') plt.show()

shows how data dot clouds can be displayed in a 3-D plot in an easily interpretable way:

Figure 10.4: 3-D plot.

● 95

Machine Learning with Python 220224 UK.indd 95

23/03/2022 11:52

Machine Learning with Python

By default, graphical outputs in Jupyter or Spyder appear directly in the worksheet or in the console (Figure 8.24). By using the "magic" commands %matplotlib qt the graphs are sent to a separate window. Using %matplotlib inline the internal output is used again.

Figure 10.5: 3-D plot in a separate window. The magic commands only work in an iPython console or in a notebook, not in a classic Python script. In that case, consider using from IPython import get_ipython get_ipython().run_line_magic('matplotlib', 'inline') for the inline plot, and get_ipython().run_line_magic('matplotlib', 'qt') for external windows. In addition, matplotlib provides a variety of other display and visualization methods. These are presented and explained within the individual ML and AI applications in the following chapters.

10.2 The math genius: Numpy The NumPy library was already used in the previous section. Here, this very universal library will be presented in more detail.

● 96

Machine Learning with Python 220224 UK.indd 96

23/03/2022 11:52

Chapter 10 • Useful Assistants: Libraries!

NumPy is an acronym for "Numeric Python" or "Numerical Python". It is an extension module for Python, which is mostly written in C. This ensures that the mathematical and numerical functions and routines provide the best possible execution speed. NumPy also supplements the Python programming language with powerful data structures for efficient computation with large arrays and matrices. The implementation aims at vast data structures ("big data"). Furthermore, the module offers a comprehensive selection of mathematical functions that are useful when working with these data. Both NumPy and SciPy (see below) are not usually included in a standard Python installation. However, NumPy and all the other modules mentioned are available in Anaconda and can be included there without any problems. On the Raspberry Pi, the installation is done via pip3 install numpy

In NumPy, basic mathematical functions operate element-wise on arrays. The library provides powerful multi-dimensional array functions and basic tools for calculating and manipulating arrays. SciPy builds on this and provides a variety of functions that work with NumPy arrays. The available function are highly useful for various types of scientific and engineering applications. It should be noted that in NumPy the * operator is, contrary to the usual mathematical definition, an element-wise multiplication and not a matrix multiplication. The @ operator is reserved for matrix multiplication via NumPy. The Jupiter notebook numpy-demo.ipynb program contained in the download package provides some interesting examples of using NumPy in practice. An important aspect of NumPy is the processing speed of matrix operations. For example, the example in the fourth cell of numpy_demo.ipynb shows the following result on a quadcore computer with 3-GHz CPU clock: Python: 614.221 ms Numpy: 15.332 ms NumPy is in this example 42 x faster!

In this example, the element-wise addition of two vectors is executed 42 times faster than with the classic Python variant! Using NumPy, a 3×3 matrix can be defined via: x = np.array([[1,2,3],[4,5,6],[7,8,9]])

# Create a rank 2 array

print(x)

● 97

Machine Learning with Python 220224 UK.indd 97

23/03/2022 11:52

Machine Learning with Python

COLUMN

1.

2.

3.

|

|

|

LINE

[[

1

2

3

]

---

1.

[

4

5

6

]

---

2.

[

7

8

9

]]

---

3.

For displaying the matrix as a 2-D graph, the expression plot.imshow(np.asfarray(x), interpolation ="none" )

can be used. (see program numpy_plot.ipynb):

Figure 10.6: 3x3 matrix rendered in 2-D. Dark colors represent small values (dark blue = 1), light colors indicate large values (yellow = 9). A 3-D representation is also easily possible (see numpy_plot.ipynb): from mpl_toolkits.mplot3d import Axes3D import matplotlib.pyplot as plt from IPython import get_ipython get_ipython().run_line_magic('matplotlib', 'inline')

# inline or qt

# Data generation x = np.linspace(1,3,3) y = np.linspace(1,3,3) X, Y = np.meshgrid(y, x) fig = plt.figure(figsize=(20, 20)) ax = fig.gca(projection = '3d') Xi = X.flatten() Yi = Y.flatten()

● 98

Machine Learning with Python 220224 UK.indd 98

23/03/2022 11:52

Chapter 10 • Useful Assistants: Libraries!

Zi = np.zeros(matrix.size) dx = .8 * np.ones(matrix.size) dy = .8 * np.ones(matrix.size) dz = data.flatten() ax.bar3d(Xi, Yi, Zi, dx, dy, dz, shade=True) plt.show()

Figure 10.7: 3-D representation of a matrix. Using appropriate scaling, even larger amounts of data can be displayed in a suitable way:

Figure 10.8: 3-D representation of a matrix containing 400 elements.

● 99

Machine Learning with Python 220224 UK.indd 99

23/03/2022 11:52

Machine Learning with Python

The graphical representation of numpy matrices with matplotlib plays a key role in AI applications and will be used in various examples.

10.3 Data-mining using Pandas Pandas is derived from the term "panel data (series)". This relates to the observation of facts over longer periods of time. Originally, the library contained data structures and operators for accessing tabular data and time series. In the meantime, however, the Pandas package became one of the most important tools for data scientists and analysts. Thus, Pandas represents the backbone of many data processing projects. With the help of Pandas, even large amounts of data can be: - adjusted or cleaned; - transformed; - analyzed. Pandas is also usually the tool of choice when loading data from files in different formats. For example, if a data set is stored on the computer's hard drive as CSV file (comma separated values), Pandas can extract the data and convert it into a python-readable data set. In addition, pandas includes statistical methods such as -

averaging; calculation of medians; finding maxima/minima; determination of correlations.

and more. In addition, missing values in a data set can be added by interpolation or rows or columns of matrices can be filtered according to certain criteria. Of course, the data can subsequently be visualized with the help of MatPlotLib in the form of bar, line, or bubble charts, etc. Pandas also provides suitable routines for saving the cleaned or transformed data in a new file or database. This makes it easy to capture and edit data structures. For example, if you want to find out who owns what clothes in a group of friends, pandas is the tool of choice. A data set can be created very easily via: data = { 'shoes': [2, 1, 4, 1], 'socks': [2, 3, 7, 2] }

(see pandas_demo.ipynb in the download package). The output can be configured in the form of a formatted table:

● 100

Machine Learning with Python 220224 UK.indd 100

23/03/2022 11:52

Chapter 10 • Useful Assistants: Libraries!

Figure 10.9: Tabular output using pandas. Via: clothes=pd.DataFrame(data, index=['Albert', 'Berta', 'Charly', 'Donna'])

the indices can be replaced by real names:

Figure 10.10: Table including real names In this way, a first clear overview is available with only a few lines of code. Via the instruction datafile = pd.read_csv('C:/DATA/pandas/clothes.csv')

data can also be read directly from a file on a hard disk drive. For processing matrices, the usual methods are available. To duplicate each value in a table, the following code line can be used: datafile=datafile+datafile

The new file can be opened and written to the hard disk using datafile.to_csv('C:/DATA/pandas/new_clothes.csv')

Using Pandas, larger amounts of data can also be downloaded directly from the Internet: url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv" names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class'] dataset = pd.read_csv(url, names=names)

● 101

Machine Learning with Python 220224 UK.indd 101

23/03/2022 11:52

Machine Learning with Python

In this way, a comprehensive data set for the iris flower species can be obtained, for example:

Figure 10.11: Data set directly loaded from the Internet. This data set will play a central role in the Iris classification example in Section 11.2.

10.4 Learning and visualization using Scikit, Scipy, SkImage & Co. SciPy (Scientific Python) is often used in combination with NumPy. SciPy extends the power of NumPy with even more useful functions, such as -

fast minimization / maximization; regression (Fast) Fourier Transform — FFT splines filters

and many others. In addition, SciPy offers some basic functions for working with images. For example, methods for reading images from disk and converting the image data into numpy arrays are provided. This methods can be used to process the images via matrix operations. In this way, image processing such as rotating, zooming, changing the image size or resolution is easy to handle. The following figure shows a simple example that illustrates these functions (ref. scipy_demo.ipynb in the download package):

● 102

Machine Learning with Python 220224 UK.indd 102

23/03/2022 11:52

Chapter 10 • Useful Assistants: Libraries!

Figure 10.12: Image processing using Scipy. Here, in a first step a section is cut out of the original image. Then, artificial noise is applied to the image. This is useful, for example, for testing ML-based noise filters. Finally, a splineedge filter was used to enhance and intensify the edges in the image. The libs scikit and skimage provide further image processing methods. Details can be found in various examples in the following chapters.

10.5 Machine Vision using OpenCV Computer Vision (CV) is one of the most interesting topics of AI. It provides functionalities and routines that allow the Raspberry Pi, for example, to perform amazing tasks using the appropriate libraries. OpenCV is ideally suited to record and process images or videos if a PiCam is connected to the Raspberry Pi. The classic PiCam provides the following features: Dimensions: Sensor: Image Resolution: Video resolution:

Connection:

25 mm × 20 mm × 9 mm 5 megapixels with fixed focus lens up to 2592 × 1944 pixels 1920 × 1080 at 30 frames/s 1280 × 720 at 60 frames/s 640 × 480 at 60 to 90 frames/s CSI via ribbon cable

The camera is connected via a 15-pin serial CSI interface (Camera Serial Interface) on the Raspberry Pi. The advantage of this interface compared to USB is the direct connection between the camera module and the Broadcom chip.

● 103

Machine Learning with Python 220224 UK.indd 103

23/03/2022 11:52

Machine Learning with Python

This means that high frame rates can be achieved even at higher resolutions. The CSI interface is located between the HDMI and the Ethernet socket. To connect the 15-pin ribbon cable of the camera module to the board, pull the upper part of the CSI connector up, then insert the ribbon cable with the blue marking towards the Ethernet connector and press the lock back down. This provides good contact, and the cable is firmly connected to the Raspberry Pi. Now the camera support has to be activated in Raspbian IO. This can be done using the configuration tool: Settings

Raspberry Pi

In the configuration menu go to the Interfaces tab and set the camera to Enable. Finally, a reboot is required and the camera can be used.

Figure 10.13: Activation of the PiCam. The camera module can be addressed via the two standard programs: • raspistill (for images); • raspivid (for videos). Both have numerous options. Using the instruction raspistill -o image.jpg a test image can be recorded. The picture is saved in the home directory (... / home / pi). If the PiCam is activated successfully, it is possible to test the integration into a Python program. The OpenCV library is well suited for this purpose. The sequence of instructions for installation the lib can be found in the download package. Using the program (see also download package): CamCheck_1V0.py the video function of the camera can be checked. After starting the program:

● 104

Machine Learning with Python 220224 UK.indd 104

23/03/2022 11:52

Chapter 10 • Useful Assistants: Libraries!

import cv2 cap=cv2.VideoCapture(0) cap.set(3,320) # set Width cap.set(4,240) # set Height while(True): ret, frame = cap.read() # frame = cv2.flip(frame, 1) # Flip camera gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) cv2.imshow('frame', frame) cv2.imshow('gray', gray) key = cv2.waitKey(1) & 0xFF if key == ord("q"): print("bye...") break cap.release() cv2.destroyAllWindows()

two active video windows open up. One shows the full color and the other a black and white version of the video stream from the connected camera. The size of the video output can be controlled via cap.set. The video stream is processed in the main loop. The video image orientation can be set via frame = cv2.flip(frame, 1) # Flip camera

In this way it is always possible to display an upright image, even if the camera is at an unfavorable position. The second parameter controls the alignment: • = 0: Mirroring the image on the x-axis (vertical mirroring) • > 0: mirroring on the y-axis (horizontal mirroring) • < 0: Mirroring on both axes If the image orientation does not require any changes, the line can be commented out. The conversion to a black and white image is done via gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

The video stream is visualized using: cv2.imshow('frame', frame)

The remaining lines are required for coordinated program termination. With cap.release() cv2.destroyAllWindows()

the streaming is terminated and all open video windows are closed.

● 105

Machine Learning with Python 220224 UK.indd 105

23/03/2022 11:52

Machine Learning with Python

Alternatively, a simple live video display of the PiCam can be carried out using the following Python program (see OpenCV_demo.py in the download package): from picamera.array import PiRGBArray from picamera import PiCamera import time import cv2 camera = PiCamera() camera.resolution = (320, 240) camera.framerate = 60 rawCapture = PiRGBArray(camera, size=(320, 240)) time.sleep(0.1) for frame in camera.capture_continuous(rawCapture, format="bgr", use_video_ port=True): image = frame.array cv2.imshow("press 'q' to quit", image) key = cv2.waitKey(1) & 0xFF rawCapture.truncate(0) if key == ord("q"): cv2.destroyAllWindows() print("bye...") break

The real strength of OpenVC lies in its efficient image and video processing. Thanks to optimized algorithms, highly interesting projects can be implemented with the relatively modest computing power of the Pi. The following figure shows for example the contour detection in a live video stream on the RPi (ContourDetector_1V0.py):

● 106

Machine Learning with Python 220224 UK.indd 106

23/03/2022 11:52

Chapter 10 • Useful Assistants: Libraries!

Figure 10.14: Live video contour capture in openCV It is also possible to operate several cameras simultaneously on a Raspberry. In addition to the PiCam, a USB webcam can also be connected, as the following figure shows:

Figure 10.15: USB Webcam and sound card connected to the Raspberry Pi. If the webcam also includes a microphone, it can even be used as an acoustic input device. This can replace the missing microphone input on the Rasp Pi. Further details can be found in Chapter 14. The program DoubleCam.py shows how cameras can be operated simultaneously on an RPi:

● 107

Machine Learning with Python 220224 UK.indd 107

23/03/2022 11:52

Machine Learning with Python

import cv2 cap1 = cv2.VideoCapture(0) cap1.set(3,320) # set Width cap1.set(4,240) # set Height cap2 = cv2.VideoCapture(1) cap2.set(3,320) # set Width cap2.set(4,240) # set Height ''' cap3 = cv2.VideoCapture(2) cap3.set(3,320) # set Width cap3.set(4,240) # set Height ''' while True: ret1, img1 = cap1.read() cv2.imshow('CAM 1', img1) ret2, img2 = cap2.read() cv2.imshow('CAM 2', img2) key = cv2.waitKey(1) & 0xFF if key == ord("q"): print("bye...") break cap1.release() cap2.release() cv2.destroyAllWindows()

By operating two camera modules at the same time, it is even possible to capture objects in 3-D. However, the Pi is currently reaching its capacity limits very quickly using extended 3-D models for image processing. Due to the efficient processing of video data, openCV is also often used in robotics projects (see Bibliography). In Section 13, openCV is used for object detection and face recognition projects.

● 108

Machine Learning with Python 220224 UK.indd 108

23/03/2022 11:52

Chapter 10 • Useful Assistants: Libraries!

Figure 10.16: Two cameras attached to the Pi.

10.6 Brainiacs: KERAS and TensorFlow KERAS is a deep-learning platform, which was developed specifically for Python applications. The lib is mainly based on TensorFlow. The platform was built to enable rapid experimentation. TensorFlow, on the other hand, is a framework for data stream-oriented programming. In recent years, TensorFlow has become one of the most important open source platforms in the field of machine learning. The name TensorFlow originates from mathematical elements, so-called tensors, which play a central role in multidimensional algebra. Since artificial neural networks are based on multidimensional data fields, tensors are of fundamental importance in this area. The Google Brain team originally developed TensorFlow for internal Google projects. In 2017, however, it was released under the Apache 2.0 open-source license. KERAS means horn in Greek (κέρας). This is a reference to ancient Greek mythology. There are spirits, who had deceived people with false visions, that were sent into the mortal world through a horn gate. The KERAS library provides an interface to machine learning with a focus on deep learning. With KERAS, the cross-platform functions of TensorFlow can be used in an easy way. KERAS models created on powerful mainframes can be easily exported. They can be executed in a browser, on a smartphone or even small single-board computers such as the Raspberry Pi. The KERAS library plays a major role in this book. The details are discussed in detail when used in the various applications. For this reason, only a few basic uses of the library are to be outlined here. Figure 10.17 shows how the Lib is essentially used to create AI models.

● 109

Machine Learning with Python 220224 UK.indd 109

23/03/2022 11:52

Machine Learning with Python

Figure 10.17: Construction of a neural network application using KERAS. The detailed sequence of instructions for installing KERAS on the Raspberry Pi can be found in the download package released for this book. The file MNIST_keras_01.ipynb provides a first impression of KERAS. The following steps are required for a KERAS project: 1. First, the desired data set must be loaded. Under certain circumstances, other libs such as Panda can also be used beneficially here. 2. Then the data are typically divided into a training data set and a test data set. 3. Now a neural network can be created in the form of a specific model. For this purpose, the individual layers are defined one after the other, for example: myANN = Sequential() myANN.add(Convolution2D(32, (3, 3), activation='relu', input_shape=(28,28,1))) myANN.add(Convolution2D(32, 3, 3, activation='relu')) myANN.add(MaxPooling2D(pool_size=(2,2))) myANN.add(Dropout(0.25)) myANN.add(Flatten()) myANN.add(Dense(128, activation='relu')) myANN.add(Dropout(0.5)) myANN.add(Dense(10, activation='softmax'))

The individual layers can be lined up like in a construction kit for kids. An overview of the networks constructed in this way can be displayed using the instruction myANN.summary()

● 110

Machine Learning with Python 220224 UK.indd 110

23/03/2022 11:52

Chapter 10 • Useful Assistants: Libraries!

The result looks like this:

Figure 10.18: Neural network constructed from KERAS elements. 4. Now the newly constructed network has to be compiled: myANN.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras. optimizers.Adadelta(),metrics=['accuracy'])

Here, various options can be selected: loss=keras.losses.categorical_crossentropy; optimizer=keras.optimizers.Adadelta(); metrics=['accuracy']. 5. Finally, the network can be trained using myANN.fit(x_train, y_train,

batch_size=100, epochs=10,

verbose=True,

validation_data=(x_test, y_test))

● 111

Machine Learning with Python 220224 UK.indd 111

23/03/2022 11:52

Machine Learning with Python

Figure 10.19: Network-training. 6. After successful training, the quality of the network can be evaluated: from keras.models import load_model mnist_model = load_model(model_path) loss_and_metrics = mnist_model.evaluate(x_test, y_test, verbose=1) print("Test Loss", loss_and_metrics[0]) print("Test Accuracy", loss_and_metrics[1])

This provides information concerning the training speed: 313/313 [==============================] - 32s 101ms/step

and the quality of the training success achieved for the training data: loss: 0.0016 - accuracy: 0.9905

as well as for the test data: Test Loss 0.0016376518178731203 Test Accuracy 0.9904999732971191

Moreover, it is possible to show the learning success in so-called learning curves:

● 112

Machine Learning with Python 220224 UK.indd 112

23/03/2022 11:52

Chapter 10 • Useful Assistants: Libraries!

Figure 10.20: Learning curves in KERAS. Further details will be presented once again in the according practical applications. 7. Finally, the network is available for applications and individual data, like images or sounds, can be classified.

10.7 Knowledge transfer: sharing the learning achievements The transfer of learning performance is a special feature of machine learning and artificial intelligence. In biological systems, learning achievements are usually not transferable. Human beings as well as animals have to perform all important learning activities individually again and again. After birth, the brain of a human being is empty in terms of useful knowledge. The memory is present but the content is almost completely missing. A newborn is not able to feed itself independently. Walking, running, talking, … — all of these abilities, which are so natural for adults, have to be learned by every newborn human being. Later, reading, writing, riding a bicycle, etc., are added. Learning these skills requires years of practice and training. The fact that the parents have learned these activities is of little use to the child. It has to learn everything all over again in painstaking detail. In the technical field, however, things are completely different. Once a neural network has mastered a certain task, this learning success can be transferred to other systems in a fraction of a second. As soon as a vehicle is able to move autonomously through a large city, this performance can be transferred to other cars or trucks without any real limit. Unlike human kids, a new autonomously driving car does not need to acquire its "driver's license" afresh. It is perfectly sufficient to copy the weights of the trained network and transfer them to a new system. This can be done millions of times in a very short time. The possibility to copy acquired knowledge in a fraction of a second is certainly also one of the greatest dangers of the new technology. Not only good, desirable and controllable learning successes can be reproduced in this way. Dangerous developments could also spread at breakneck pace — quite similar to a (computer) virus.

● 113

Machine Learning with Python 220224 UK.indd 113

23/03/2022 11:52

Machine Learning with Python

This danger is always present when transferring painstakingly acquired learning achievements from one neural network to another. However, in KERAS, this procedure is possible without any problems. If necessary, a fully trained model can be saved using myANN.save(model_path) for later use on an external storage medium. There, it is available for immediate applications on other systems, possibly millions of them. On the other hand, this transfer offers the possibility of carrying out the training on a fast high-performance computer and then transferring the finished weights to a low-cost small system such as the Raspberry Pi or a MaixDuino. You are not even limited to a local computer. Various companies offer training services on high-performance systems. For private applications, computing time is often even made available free of charge.

10.8 Graphical representation of network structures With increasing complexity, the representation of models in pure text form quickly becomes confusing. A graphical representation can help here. With graphviz (for installation, information see links in the download package), a useful tool is available for the descriptive representation of network structures. It makes it much easier to keep track of the individual layers of a neural network. The notebook Network_graphics_graphviz.ipynb can be used as an example. In a graphviz representation, the structure and function of the respective layer is given in short form. In addition, the number of input and output parameters etc., can be displayed alongside the individual layers. Further details and information on this display version can be found in Section 12.6. Figure 10.21 shows a typical output of graphviz:

● 114

Machine Learning with Python 220224 UK.indd 114

23/03/2022 11:52

Chapter 10 • Useful Assistants: Libraries!

Figure 10.21: Neural network in graphviz representation.

10.9 Solution of the XOR problem using KERAS Finally, here is a KERAS-driven solution for the XOR problem already presented in the first chapters of the book (Keras_EXOR_1V0.ipynb). After loading the required libs, the desired input and output values are defined: inputValues=np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) outputValues=np.array([[0], [1], [1], [0]])

It is easy to see that the values correspond to the table in Chapter 2. Now, a neural network using two input nodes; two hidden nodes; one output neuron can be constructed: num_inner = 2 model = Sequential() model.add(Dense(num_inner, input_dim=2, activation='sigmoid')) model.add(Dense(1))

● 115

Machine Learning with Python 220224 UK.indd 115

23/03/2022 11:52

Machine Learning with Python

after finalizing the training procedure: model.compile(loss='mean_squared_error',optimizer='adam',metrics=['accuracy']) model.fit(x=inputValues, y=outputValues, epochs=10000, verbose=0) print(model.predict(inputValues))

the following typical result can be obtained: [[ [ [ [

0.00000132] 0.9999364] 0.9999851] 0.0000268]]

# # # #

Input Input Input Input

0, 0, 1, 1,

0 1 0 1

-

Output: Output: Output: Output:

0 1 1 0

The remaining code lines provide a graphical representation of the result:

Figure 10.23: Solution for the XOR Problem. It can be seen that the network draws two isolated separation lines. Between the lines, values close to one are calculated. In the outer area, the results are close to zero. Thus, the XOR perceptron problem has been successfully solved by a KERAS network.

10.10 Virtual environments As different modules or libraries such as NumPy, MatPlotLib, KERAS, and TensoFlow are evaluated, they accumulate in the global Python installation. Since machine learning and AI are rapidly evolving research areas, new versions and updates become available frequently. However, the various library versions are not always compatible with each other. This often leads to problems when several libraries are used simultaneously for a particular project. By using Virtual Environments, conflicts between different library versions can be avoided. If, for example, a special TensorFlow version is installed in its own environment, other versions can remain on the Pi without causing undesirable side effects. Before such a virtual environment can be created, the required package must first be installed:

● 116

Machine Learning with Python 220224 UK.indd 116

23/03/2022 11:52

Chapter 10 • Useful Assistants: Libraries!

sudo pip3 install virtualenv The following steps are then required for the actual creation of the virtual environment: 1. 2. 3. 4.

Creation of a project folder using mkdir "folder_name". Changing to this project folder using cd "folder_name". Creation of the actual virtual environment using "python3 -m venv ./venv" Activation of the virtual environment using "source ./venv/bin/activate "

Figure 10.23: Creation of a virtual environment. As the above screenshot shows, only a few commands are required to create and activate the virtual environment. You can tell that the new environment is actually active by the prefix (venv) in the current terminal line. The following commands can be used to check various properties of the new environment: - which python: - which pip3: - pip3 list (pip3 freeze):

Shows active python version Shows active pip version Shows the modules and libraries active in the current virtual environment

The virtual environment can be closed using the deactivate statement. The following screenshot shows a typical regular Python environment with a large number of installed libraries:

● 117

Machine Learning with Python 220224 UK.indd 117

23/03/2022 11:52

Machine Learning with Python

Figure 10.24: Libraries of a typical Python environment. In comparison, a newly created virtual Python environment contains very few libraries:

Figure 10.25: Just a few Libs are visible a new virtual environment. So, you can start here with a clean environment and only reinstall the libs with the required versions. Version conflicts are thus excluded as far as possible. An example of a practical application of virtual environments can be found in section 13.2.

● 118

Machine Learning with Python 220224 UK.indd 118

23/03/2022 11:52

Chapter 11 • Practical Machine Learning Applications

Chapter 11 • Practical Machine Learning Applications In the first chapters, the basics of machine learning were briefly discussed as a subarea of AI. The following sections delve deeper into practical matters and implement the methods of deep learning using neural networks. Either a Windows PC or a Raspberry Pi can be used for these tasks. In machine learning, the equivalent of the Hello World program is the classification of Iris flower types. The associated data set consists of measured values of sepal and petal lengths and widths for different Iris species.

11.1 Transfer functions and multilayer networks One of the ways to perform Iris classification is to use multi-layer perceptrons (MLPs). This variant of an artificial neural network can map any numerical input data to given output data. The MLP consists of several neuronal layers. Often, but not always, each layer is fully connected to the following one. The nodes of the layers work with non-linear activation functions. This concept is based on the behavior of biological neurons in highly-developed brains, which also communicate via active or inactive action potentials. Interestingly, an intermediate way between pure digital switching and analog signal processing is used here. In machine learning, artificial neural networks replace this behavior by applying a threshold function.

Figure 11.1: Important transfer functions. Different variants are used for the threshold or transfer functions: - Step functions; - ReLu as "rectifier function" (Rectified Linear Unit); - Sigmoid functions such as the logistic or tangent hyperbolic function. The computational effort is higher in the latter case, but sigmoid functions usually behave less critically in the training phase of a network compared to the step or ReLu functions. With these threshold functions, a neural network can also model non-linear relationships. Only then is it able to effectively solve complex tasks such as image or speech recognition.

● 119

Machine Learning with Python 220224 UK.indd 119

23/03/2022 11:52

Machine Learning with Python

Once the transfer function has been defined, individual neurons can be linked to form a complete network capable of learning. Between the input and the output level there are usually one or more inner or hidden layers. The topology, i.e., the exact structure of a network, depends strongly on the respective task. Since there is no standard procedure for determining the number of layers and nodes in a neural network, experience and knowledge are always of great importance at this point. In many cases, it is therefore necessary to try different combinations in order to find the best network structure. For Iris classification, the topology shown in Figure 11.2 is often used:

Figure 11.2: Multi-layer neural network.

11.2 Flowers and data The Iris data set consists of measured values for three types of Iris (Iris setosa, Iris virginica and Iris versicolor). Four characteristics were recorded from each flower: • the length and width of the sepals; • the length and width of the petals. each in centimeters (Figure 11.3). This data set is ideal for training a classifier. The two Jupyter notebooks are used for classification and graphical data display: • iris_train_MLP_1V0.ipynb • iris_graphics_1V1.ipynb The notebooks can be run on a Windows PC as well as on a Raspberry Pi.

● 120

Machine Learning with Python 220224 UK.indd 120

23/03/2022 11:52

Chapter 11 • Practical Machine Learning Applications

Figure 11.3: The iris flower. For this, all relevant modules are first loaded from the libraries: import pandas as pd from sklearn.neural_network import MLPClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report, confusion_matrix import matplotlib.pyplot as plt import time

Now the above mentioned data record can be loaded. The CSV file contains 50 records for each flower. The data are separated by commas (CSV = Comma Separated Values). Each line contains four numerical values, each representing one of the above attributes. The last one is the so-called label, i.e., the real name of the plant. This file is downloaded directly from the internet to the PC or Raspberry Pi and saved in an array: url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv" names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class'] dataset = pandas.read_csv(url, names=names)

The array has a size of 150 lines and 5 columns (150×5), as the statement print (dataset.shape) confirms. The complete or partial output of the data using print (dataset.head (100)) is shown in Figure 11.4 for the first 100 values in Jupyter:

● 121

Machine Learning with Python 220224 UK.indd 121

23/03/2022 11:52

Machine Learning with Python

Figure 11.4: Iris data set excerpt.

11.3 Graphical representations of data sets This is where MatPlotLib can display its strengths. With just a few instructions, the distribution of data sets can be displayed graphically. This is particularly important for complex data structures. Just as with large tables or lists, it is not very informative to look at the pure numerical data of large arrays. In order to be able to grasp at a glance what the numbers mean, it is better to visualize the data. For graphical representations, the purely numerical data must be extracted: NumericData = dataset.values[:,0:4] Subsequently, different graphical representations can be created with the methods of MatPlotLib. The following function is used: scatter(x, y, c=color, …) It shows a scatter plot of the data points in a y/x diagram. For example, this code (iris_ graphics_RasPi_1V0.ipynb in the download package): fig = plt.figure(1) ax = fig.add_subplot(1,1,1) ax.scatter(NumericData[0:50,0],NumericData[0:50,1],c='red') ax.scatter(NumericData[50:100,0],NumericData[50:100,1],c='green') ax.scatter(NumericData[100:150,0],NumericData[100:150,1],c='blue') ax.set_xlabel('Kelchblattlaenge (cm)') ax.set_ylabel(‚Kelchblattbreite (cm)') ax.grid(True)

generates the following scatter plot:

● 122

Machine Learning with Python 220224 UK.indd 122

23/03/2022 11:52

Chapter 11 • Practical Machine Learning Applications

Figure 11.5: Scatter plot of the lily data. For the graphical representation, the following colors were assigned to the individual Iris types: Iris virginica

Iris versicolor

Iris setosa

green

blue

red

In addition, different geometric symbols (triangles, crosses, and stars) were used for the different sets of points. With this version of representation, the setsosa, for example, can be distinguished comparatively well from the other two types. Other scatter diagrams show different cluster formations more clearly.

Figure 11.6: Scatter plots for different databases. Using the MatPlotLib library, the data can also be displayed in a three-dimensional representation. Figure 11.7 shows this variant:

● 123

Machine Learning with Python 220224 UK.indd 123

23/03/2022 11:52

Machine Learning with Python

Figure 11.7: Scatter plot in 3-dimensional representation. The figures demonstrate that the iris flowers cannot be identified based on just one or two parameters. Similar to the XOR problem, it is also not possible to clearly separate the data point clouds with straight lines in the 2-D plots, or flat surfaces in a 3-D representation. We are therefore faced with a "not linearly separable" problem.

11.4 A net for iris flowers After the graphical analysis of the iris data set, in this section a neural network that can be used to determine the flower species from the petal data [2,3] will be designed. The associated Jupyter Notebook (iris_train_MLP_1V0.ipynb) can be loaded from the download package onto either a PC or a Raspberry. Figure 11.8 shows the opened notebook on a Pi.

Figure 11.8: Jupyter Notebook on Raspberry Pi.

● 124

Machine Learning with Python 220224 UK.indd 124

23/03/2022 11:52

Chapter 11 • Practical Machine Learning Applications

As can be seen in Figure 11.4, the IRIS data set contains five columns. The task for an ML algorithm is therefore to predict the class, i.e., the values in the fifth column. The input data is found in the first four columns and correspond to the measures of sepal length, sepal width, petal length and petal width in centimeters. After including the required libraries: import pandas as pd from sklearn.neural_network import MLPClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report, confusion_matrix import matplotlib.pyplot as plt import time

the Iris data set is loaded directly from the Internet: url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv" names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'species'] data_train = pd.read_csv(url, names=names)

Then the individual columns are given names. Using print(data_train)

the freshly loaded table can be checked. If the Raspberry Pi does not have an Internet connection, the data (iris_D.csv) could also be loaded as a table, for example, from a USB stick or similar storage medium. In this case, the conversion into an array can be done via the instruction: data_train = pd.read_csv('/home/pi/DATA/IRIS/iris_D.csv')

The last column contains the botanical names for the respective category. The data thus consist of alpha-numeric entries. Although these are easily readable by humans, neural networks can process purely numerical data more easily. Therefore, the names are converted into pure numerical values: data_train.loc[data_train['species'] =='Iris-setosa', 'species'] data_train.loc[data_train['species'] =='Iris-versicolor', 'species'] data_train.loc[data_train['species'] =='Iris-virginica', 'species']

=0 =1 =2

The result can be checked again with print(data_train):

● 125

Machine Learning with Python 220224 UK.indd 125

23/03/2022 11:52

Machine Learning with Python

0 1 2 ...

sepal-length 5.1 4.9 4.7 ...

147 6.5 148 6.2 149 5.9

sepal-width 3.5 3.0 3.2 ...

petal-length 1.4 1.4 1.3 ...

petal-width 0.2 0.2 0.2 ...

species 0 0 0 ...

3.0 3.4 3.0

5.2 5.4 5.1

2.0 2.3 1.8

2 2 2

11.5 Training and testing If only one data set is available, it makes sense to divide it into training and test data. The first data set is used to train the neural network. The test data is used to evaluate the performance of the neural network independently of the training data. This can reduce the problem of so-called overfitting. This phenomenon can occur when a machine-learning system is over-adapted to the test data. The training data are then evaluated very well, but new and previously unknown data lead to worse results than would be possible in principle. Using the instructions: data_train_array = data_train.to_numpy() X_train, X_test, y_train, y_test = train_test_split(data_train_array[:,:4], data_train_array[:,4], test_size=0.2)

20% (test_size = 0.2) of the data are cut off. This frees up 150 * 0.2 = 30 test data sets that are not used for training. They will be used later for the independent evaluation of the network. With only two instructions, a neural network can now be trained: mlp = MLPClassifier(hidden_layer_sizes=(6,), max_iter=1000) mlp.fit(X_train, y_train)

These two lines of code are sufficient to create a powerful neural network that can take on tasks that are quite relevant in practice. The power of Python libraries like SciKit is impressively demonstrated here. With just a few instructions, a system can be created that would require a whole stack of code pages in other languages. In a first step, the MLPClassifier is initialized using two parameters: • hidden_layer_sizes • max_iter The first parameter (hidden_layer_sizes) is used to define the size of the hidden layers. For example, a layer with six nodes is defined in this case (Figure 11.2).

● 126

Machine Learning with Python 220224 UK.indd 126

23/03/2022 11:52

Chapter 11 • Practical Machine Learning Applications

Using these parameters, the topology of the network can be varied within wide limits. For example: mlp = MLPClassifier(hidden_layer_sizes=(5, 7, 4), max_iter=1000) leads to a network with three hidden layers. It provides five neurons in the first, seven in the second and four neurons in the third inner layer. Python automatically determines the number of input (four leaf sizes) and output neurons (three Tris types) from the format specifications of the data set. The second parameter in the MLPClassifier ("max_iter") determines the maximum number of iterations that the neural network should execute. When the internally specified accuracy or this maximum number of iterations is reached, the training will be terminated. In addition, further optional parameters are available in the MLP classifier, among others [2]: • activation{'identity', 'logistic', 'tanh', 'relu'}, default='relu' • Definition of the activation function for the hidden layers. Here, for example, the variants • logistic: logistic sigmoid function • relu, linear unit function (max(0, x)) and others are available. • solver{'lbfgs', 'sgd', 'adam'}, default='adam' The solver determines the function for optimizing the weights. • Using Adam, a gradient-based optimizer is used. This is preferred for large data sets. For smaller data sets, lbfgs may be able to converge faster and thus achieve better performance. • verbose: bool, default=False For continuous monitoring of the iteration progress during the training, the variable verbose must be set to True. The activation function relu and the optimizer adam are set as default values. If necessary, the functions can be adapted to the respective tasks with the help of the activation or solver parameters. Via the instructions

● 127

Machine Learning with Python 220224 UK.indd 127

23/03/2022 11:52

Machine Learning with Python

mlp = MLPClassifier(hidden_layer_sizes=(6,85), activation='logistic', solver='lbfgs', max_iter=3000, verbose=True) the already known network is defined again. Now, however, the following parameters are used: • Activation function: • Solver: • Maximum Iterations: • verbose=True:

logistic lbfgs (also Broyden-Fletcher-Goldfarb-Shanno Algorithm (BFGS) using limited memory 3000 the Iteration progress is monitored continuously

In the second line, the fitting function is used to train the algorithm with the training data generated in the last section (X_train and y_train). Depending on the parameters and the network structure used, the resulting training times can vary greatly. The training time ranges from a few seconds to several minutes. In order to be able to compare the efficiency of the different variants, it is useful to calculate and check the required running times after the training has been completed. This task can be carried out via the instructions start_proc = time.process_time() before the start of the training and end_proc = time.process_time() after the end and the corresponding output is given by print('runtime: {:5.3f}s'.format(end_proc-start_proc)) Several variants are available for this task in the Jupyter script. These can be activated by setting and removing the comment character (#). Typical training times on a Raspberry Pi 4 are between two and up to 100 seconds. On a powerful PC, this typically takes only fractions of a second (about 0.8 seconds for a standard configuration).

11.6 What's blossoming here? After the training is finalized, the network is already able to provide some initial statements. In addition, the error rate of the training can be printed using print("result training: %5.3f" % mlp.score(X_train, y_train))

● 128

Machine Learning with Python 220224 UK.indd 128

23/03/2022 11:52

Chapter 11 • Practical Machine Learning Applications

Now the test data, previously unknown to the network, are used. An objective accuracy estimation of the network is provided by the instruction: print("result test: %5.3f" % mlp.score(X_test,y_test)) This value is more meaningful than the error rate of the training data. It is typically around 96.7%. In the test data set with 30 flowers, only one flower was incorrectly categorized: 1 - 1/30 = 0.967 = 96.7% Depending on the network architecture, the actual results may differ from the values given above. Since the "train_test_split" function randomly splits the data into training and test sets, the results also change across different training runs, as the network is not always trained or tested with the same data. The predicted values can be calculated using: predictions = mlp.predict (X_test) Now, by using print("No value prediction") for i in range(0,30): print (i," ", y_test[i]," ", predictions[i]) the output printed in a tabular form. All values — with one exception (red) — should agree with the results given above: No. 0 1 2 3 4 5 6 7 8 …

value prediction 1.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 2.0 2.0 0.0 0.0 0.0 0.0 2.0 2.0 2.0 2.0 … …

The third record, in this example an "Iris setosa" ('species' = 0) was incorrectly classified as "Iris virginica" ('species' = 2). One can now also select individual data records and have the Iris species determined for them. For the first entry (record 0):

● 129

Machine Learning with Python 220224 UK.indd 129

23/03/2022 11:52

Machine Learning with Python

No 0

sepal-length 5.1

sepal-width 3.5

petal-length 1.4

petal-width 0.2

species 0

the instruction print(mlp.predict([[5.1,3.5,1.4,0.2]])) leads to the result 0, i.e., Iris setosa — and thus a correct result. Other data sets are also classified correctly: print(mlp.predict([[5.1,3.5,1.4,0.2], [5.9,3.,5.1,1.8], [4.9,3.,1.4,0.2], [5.8,2.7,4.1,1.]])) Result: [0. 2. 0. 1.]

Only one value in the test data set shows the only misclassification: print(mlp.predict([[5.4,3.9,1.7,0.4]])) Result: 2 - thus a "setsosa" identified as "virginca".

11.7 Test and learning behavior "Look before you leap!" is an ancient saying. This ageless wisdom should also be taken to heart with machine-learning algorithms. Various evaluation methods can be used to determine how well the selected algorithm works. The following criteria are available for this purpose: confusion_matrix; precision; recall, and f1-score. The following commands provide a rating for the selected network: print(confusion_matrix(y_test,predictions)) print(classification_report(y_test,predictions)) This code generates the following result: 1. Confusion matrix:

Iris setosa Iris virginica Iris versicolor

Iris setosa [[9 [0 [0

Iris virginica 0 6 1

Iris versicolor 0] 0] 14]]

This matrix shows that 9 Iris setosa, 6 Iris virginica and 14 Iris versicolor were correctly recognized. Only one Iris versicolor was incorrectly classified as Iris virginica (red "1"). It can therefore be seen again that only one of the 30 plants in the test data set was classified incorrectly.

● 130

Machine Learning with Python 220224 UK.indd 130

23/03/2022 11:52

Chapter 11 • Practical Machine Learning Applications

2. Further evaluation criteria for issuing: precision , recall, f1-score, and support These values provide the ratios of correctly or incorrectly predicted values to the overall possible predictions. The f1 score is of particular importance. The f1 rating is a measure of the accuracy of the network. An f1 value of 1 means high; 0 means low accuracy. An f1 value of 0.97 is comparatively good, as only 120 data sets were used for training. All in all, the accuracy should always be better than 90% under the conditions given here. In addition to these percentage statements, the quality of the learning behavior can also be shown using so-called learning curves. The general statement that someone has a "somewhat flat learning curve" is not exactly a compliment among students. In the field of ML research, on the other hand, learning curves are an important and objective evaluation criterion for a specific network model. Learning curves represent the relationship between the number of training periods and the learning success of a network in a graphical form. The remaining prediction error is often used as an evaluation criterion. This usually results in a steadily falling curve. Rapid learning success is shown by a steep drop in the graph [3]. There is often a very fast decrease in the error rate at the start of the training. So, in the beginning, the network makes rapid learning progress. In the further course of the curve, a certain flattening can be observed. This is due to the fact that the network has "finalized the learning process". Even with further training cycles, no remarkable improvements can then be achieved anymore. Consequently, the network has reached its maximum performance. At this point, the training can finally be stopped.

Figure 11.9: Learning curve of a neural network with 3 layers. The learning curve shown in Figure 11.9 illustrates the typical training course for a network with 4 inputs, 6 inner neurons, and 3 outputs (Figure 11.2). To achieve an accuracy of 0.2, nearly 2000 training cycles are required. It is possible to optimize the learning curve by varying the network structure and the learning parameters. Usually, the learning curve is improved by adding more layers. However, it has been shown that when using more and more extensive networks, the training time increases significantly. The influence of other parameters such as the activation function

● 131

Machine Learning with Python 220224 UK.indd 131

23/03/2022 11:52

Machine Learning with Python

or solver can also be examined in this way. Possible goals here are an optimized learning speed or a minimal prediction error.

Figure 11.10: Learning curve of a 5-layer network. Figure 11.10 shows the learning curve of a network with a total of 5 layers. In addition to input and output layers, three inner layers are used. These have 4, 10 and 3 neurons. In total, the network already has now more than 20 neuronal connections. Accordingly, the learning curve is almost perfect and already reaches a specified test error rate of 0.1 after slightly more than 600 iterations.

● 132

Machine Learning with Python 220224 UK.indd 132

23/03/2022 11:52

Chapter 12 • Recognition of Handwritten Numbers

Chapter 12 • Recognition of Handwritten Numbers The previous chapter focused on the processing of numerical data. These were obtained manually from the measurement of Iris-flower petal sizes. This task could also be automated. However, the recognition or classification of images requires skills that were long reserved for humans. Modern machine learning methods, however, also allow technical systems to advance into the field of image recognition [4]. This also made machine-reading of handwritten characters and numbers possible. The ability to capture handwriting by computer systems opens up a multitude of new and important applications. Bank receipts, forms filled out with handwritten information, or postal addresses etc., can be evaluated and processed automatically. This offers numerous advantages to office workers, authorities, courier services, and other users. Conventional programming methods, however, were not suitable for this task. For a long time, attempts were made to make manual records readable to computer systems using mathematical procedures such as Fourier Analysis or decision trees. None of these procedures was able to prevail. An almost insurmountable problem were the individual variations in handwriting. Figure 12.1 illustrates that even minimal differences affect the recognition of a handwritten word. A typical example is the handwritten word "minimum", once written with, and once without, i-dots. In the first case, a quick glance reveals only a series of curved arcs and spikes. In the second case, a trained reader has no problem reading the word even within a brief glance.

Figure 12.1: With handwriting, even smallest details are crucial. Artificial neural networks dramatically improved the situation in the field of handwriting recognition. It quickly became apparent that this new approach was much better suited to reading handwriting by computer systems. The detection of handwritten letters and numbers quickly became a standard task for machine learning. The breakthrough came with the reliable machine recognition of postal

● 133

Machine Learning with Python 220224 UK.indd 133

23/03/2022 11:52

Machine Learning with Python

codes at the end of the 1990s. Eventually, banks and insurance companies also used the new technology to automatically read bank transfer slips and other documents. Nowadays, character recognition is part of the standard software of any average scanner and web applications even allow the reliable deciphering of even nearly illegible handwriting.

12.1 "Hello ML" — the MNIST data set MNIST stands for "Modified National Institute of Standards and Technology". Various data sets for scientific and technical applications are available from this institution. Many of them are particularly suitable for AI applications of all kinds. The MNIST Handwritten Digit Classification Dataset has gained particular importance (the link to this data set can be found in the download package). The data set contains 60,000 images of handwritten single digits. All digits from 0 to 9 are included with almost equal frequency. The readability of the individual samples ranges from virtually print quality to almost illegible. Figure 12.2 shows the first 12 samples of the data set. Each digit is recorded as a square, grey-scale image with 28 × 28 = 784 pixels (picture elements).

Figure 12.2: The first 12 digits of the MNIST record. The classic problem for an ML system is to assign a given, handwritten number image to one of 10 classes, corresponding to the digits of 0 to 9. Various network models achieve a classification accuracy of over 99.5 %. The best networks deliver error rates between 0.5 % and 0.2 %. Even experienced human data typists can hardly surpass these values, since they, too, cannot work completely error-free over a long term period. In the following chapters, the digits of the MNIST data will be classified using a neural network. For this purpose, the following libraries must be installed and imported on the target system, which is either a PC or a Raspberry Pi: import numpy as npy import scipy.special import matplotlib.pyplot as plt import matplotlib %matplotlib inline import time

# for runtime measurements

More information about the installation of the libraries can be found in chapter 10, so further details can be obtained from there, if necessary. The project is once again implemented in a Jupyter notebook. The MNIST data set can be downloaded from several sources in CSV format (comma separated values) from the internet. One source can be found in the download package (see LINKS.txt). The sources contain a training set,

● 134

Machine Learning with Python 220224 UK.indd 134

23/03/2022 11:52

Chapter 12 • Recognition of Handwritten Numbers

mnist_train.csv with pictures of 60,000 handwritten numbers including their corresponding nominal values. In addition, the file mnist_test.csv provides a test data set with 10,000 entries. The training data is imported via the instructions training_data_file = open(".../DATA/MNIST/mnist_train.csv", 'r')

# 60000 enrtries

#training_data_file = open("...C:/DATA/MNIST/mnist_train_100.csv", 'r')

# subset of 100 entries

training_data_list = training_data_file.readlines() training_data_file.close() print("number of training datasets loaded:", len(training_data_list))

and converted to a Python compatible data list. For .../DATA/MNIST/mnist_train.csv

the path in which the csv files on a PC or the Raspberry Pi are stored, e.g., C:\DATA\MNIST\mnist_train.csv

or /home/pi/DATA/MNIST/mnist_train.csv

has to be selected. If the data transfer was successful, the number of data records (like 60,000) is printed to the current cell. Since two separate data sets are available, a division into training and test data as in the Iris example is not necessary. Due to the size of the data records, however, it might make sense to not use the full amount of data. So, either all 60,000 entries in the MNIST list can be used for training or just a selection of 100, 500, or 1000, etc. data records. In the latter case, the .csv file must be reduced in size using Excel or LibreOffice or a similar spreadsheet program. The data is line-oriented, i.e., each line contains a complete digit data record. The entire training list therefore consists of 60,000 lines and 785 columns. The first column shows the nominal value ("label") of the number, the remaining 784 columns contain the gray level value for each pixel in the image of the corresponding digit. Lines that will not be used can simply be deleted. Sufficient memory should be available for this task, as the data sets to be processed are quite large. A RAM size of at least 4 Gigabytes is recommended.

● 135

Machine Learning with Python 220224 UK.indd 135

23/03/2022 11:52

Machine Learning with Python

Depending on which hardware is used, the training times can be quite long. The following table shows some examples: Number of data

Raspberry Pi 4

Dual-Core CPU

samples

(8 GB RAM)

2.8 GHz, 4 GB RAM

Quad-Core CPU

100

0.5 min

10 s

4s

60,000

3h

1.5 h

30 m

3.6 GHz, 16 GB RAM

The values are based on a session with 10 runs per training period. The test data can be loaded in the same way as the training data. If necessary, the full test data record with 10,000 entries can also be reduced to the desired size.

12.2 A neural network reads digits As soon as the training data are imported into Python or Jupyter, the construction of a neural network can be started. The number of input nodes (28 * 28 = 784) is determined by the number of pixels in each image. The number of output nodes also results directly from the given data set. The so-called 1-out-of-n ("one-hot") categorization requires exactly 10 output nodes for the 10 digits. A number from 0 to 9 is assigned to each node. If, for example, the number 7 has been recognized, the neural network ideally delivers the output: Number Probability

0 0

1 0

2 0

3 0

4 0

5 0

6 0

7 1

8 0

9 0

This procedure has the advantage that also certain a degree of reliability, the so-called probability for the respective output, becomes available. If the network could not find a clear assignment, the output may look like this: Number Probability

0 0

1 0

2 0

3 0.15

4 0

5 0.10

6 0

7 0

8 0.75

9 0

This means that the model is unable to recognize the digit unambiguously. However, there is a 75% probability that it is an "eight". Such additional statistical information would not be available when the output consists only of a single node with a value range of 0...9. A network with three layers is sufficient for initial categorization. In addition to the input and output layers, only a single intermediate layer ("hidden layer") is required. As already explained in Section 11, the exact number of nodes in the hidden layer cannot be calculated exactly. In many applications however, a good approximation is the mean value of the number of input and output nodes. In this example, this results in a number of 300 to 400 neurons for the intermediate layer. Therefore, the following values are a reasonable start configuration for setting up the network: • input nodes = 784 • hidden nodes = 300 • output nodes = 10

● 136

Machine Learning with Python 220224 UK.indd 136

23/03/2022 11:52

Chapter 12 • Recognition of Handwritten Numbers

The so-called learning rate (LR) in the range from 0 to 1 defines how much the model is varied with each training run. Small values lead to long training processes. On the other hand, learning rates that are too high lead to that sub-optimal training result or even to an entirely unstable training process. A good strategy for setting the learning rate is to start with relatively large values such like LR = 0.7...0.9. If the training runs without any stability problems, the LR value can be reduced and the results can be successively optimized at the expense of longer training times [4].

12.3 Training, tests and predictions Forecasts are essential in regional or federal elections. Also, in technology and business, reliable predictions are of great importance. For this reason, neural networks are also increasingly used for predicting customer decisions or calculating failure rates in technology. In AI, the effective training of neural networks is one of the most important prerequisites for reliable predictions. After the preparations from the last chapters, the network for digit recognition is ready for training. The notebooks MNIST_neural_network_numpy_1V4.ipynb or MNIST_neural_network_numpy_RasPi_1V3.ipynb in the download package can be used in a Jupyter notebook on a PC or Raspberry Pi for this purpose. After starting the corresponding training cell in Jupyter, the output shows a result like this: epoch # 0 completed - elapsed time: 1021.123s epoch # 1 completed - elapsed time: 2041.849s epoch # 2 completed - elapsed time: 3222.325s epoch # 3 completed - elapsed time: 4208.134s epoch # 4 completed - elapsed time: 5194.445s epoch # 5 completed - elapsed time: 6301.832s epoch # 6 completed - elapsed time: 7316.159s epoch # 7 completed - elapsed time: 8374.380s epoch # 8 completed - elapsed time: 9119.133s epoch # 9 completed - elapsed time: 10161.501s runtime: 10161.821s

Depending on the performance of the system in use, there may be slightly different training times. The network was trained with the full data set, i.e., with 60,000 digits, 10 training runs on a Raspberry Pi 4 and a time of 10,162 seconds was required, i.e., almost three hours. On a computer with a QuadCore CPU, a clock frequency of 3.5 GHz and 16 GB RAM, the training time is reduced to approx. 40 minutes. After the end of the training phase, the quality of the network should be checked. The notebook mentioned above provides the necessary code for this task. After loading the test data:

● 137

Machine Learning with Python 220224 UK.indd 137

23/03/2022 11:52

Machine Learning with Python

# test_data_file = open("/home/pi/DATA/MNIST/mnist_test.csv", 'r')

# 10000

entries test_data_file = open("/home/pi/DATA/MNIST/mnist_test_10.csv", 'r') # 10 entries test_data_list = test_data_file.readlines() test_data_file.close() print("number of test datasets loaded:", end =" ") print(len(test_data_list))

a run with a reduced test data set of 10 or 100 entries can be started. Corresponding partial data sets can be extracted once again from the complete test set using the method mentioned above. The network's prediction performance is defined as the proportion of correctly recognized digits relative to the total size of the complete data set. If the training was carried out with a reduced data set, the performance is usually between 0.5 and 0.8, if the training ran over 10 to 20 epochs. An epoch corresponds to a complete training cycle (see Section 12.10). This means that about half to 1/5th of all numbers were not correctly recognized. If the complete data set is used, a performance of up to approx. Around 95% can be achieved. In order to better understand these results, it is useful to examine individual predictions. For this purpose, selected digits and the corresponding predicted values of the network can be displayed graphically:

Figure 12.3: MNIST number prediction. Figure 12.3 shows various results of the trained MNIST network. The bargraph on the left side of the image indicates the direct network output. The lighter the color of a square in the bar, the higher the rating of the network for the number in question. For the first result, a four is assigned the highest value. However, the seven also has a certain, albeit significantly lower, probability. The result in the middle shows an incorrectly recognized number. Here a four was recognized as a nine. However, this mistake could certainly also be made by a human accountant. With the last result, the neural network could not find a clear answer. The nominal value ("label") for this figure is "seven". However, the neural network is undecided as to whether it is a three or a nine since several fields with light colors appear in the bar graph. But here, too, a human reader can certainly understand the unclear decision of the network.

● 138

Machine Learning with Python 220224 UK.indd 138

23/03/2022 11:52

Chapter 12 • Recognition of Handwritten Numbers

12.4 Live recognition of digits Working with generally available data sets is an important first step towards an automatic handwriting reader based on ML technology. For a truly universal system, for example to recognize meter readings, house numbers or handwritten notes, however, it must be possible to reliably read any available digits. The next paragraphs will show that your own handwriting can also be read with the network that has already been trained. Of course, the system can be expanded to include the recording of license plates or identification numbers on rail vehicles etc. with appropriate training sets. To recognize your own handwriting, you first have to digitize some handwritten sample digits. Simple scanners or mobile phone cameras can be used for this task. The images must be in .png format. For use with the present neural network, an image resolution of 28 x 28 pixels is required. Image processing programs such as IrfanView for Windows or the ImageMagick program on the Raspberry Pi can do a good job here. However, initial tests can also be carried out with the sample images contained in the download package. Next, self-created images can be analyzed instead of the MNIST data. The following image, for example, shows the reliable detection of a "three" (left) as only the value "three" shows a light color in the column, all other values show dark colors, i.e., low probabilities.

Figure 12.4: Recognition of home-made digits. The result on the right-hand side shows that the ML algorithm was once again "not quite sure". Besides the correct value of five, the number six also has a certain probability (green coloring). Nevertheless, the correct result was given. It is not only possible to process still images. Also, live images in a video stream can be recognized with the PiCam. For this purpose, the program MNIST_numpy__PiCam_live_1V4.py from the download package can be used on the Raspberry. After a training phase, a live image of the camera appears. This shows the preprocessed field of view of the PiCam. If the camera is pointed at a digit, the numerical value recognized by the neural network appears in the shell. Figure 12.5 shows an example for the digit "7".

● 139

Machine Learning with Python 220224 UK.indd 139

23/03/2022 11:52

Machine Learning with Python

Figure 12.5: The digit "7" is detected correctly. Using this setup, a live analysis system is now already available. Based on this set-up, it is possible to develop various practical applications such as -

reading out of water, electricity or gas meters, etc.; detection and evaluation of vehicle license plates; reading numbers on traffic signs or street numbers; recognition and digitizing of handwritten charts and tables;

12.5 KERAS can do even better! Up to now, for the evaluation of handwritten numbers, only the NumPy library was used. However, this library is mainly optimized for general numerical data processing. With the help of libraries especially designed for machine learning, the performance of the application can be significantly improved. However, the use of these libraries also has its downside. Working with these program libraries requires in-depth knowledge of Raspbian, Python and Jupyter, etc. Specifically, when using the Raspberry Pi, complex installation procedures are required in many cases. For beginners, it can therefore make sense to work with Anaconda on a PC first, since here, the necessary libraries can be integrated quite easily (see Chapter 8). Later on, with some basic knowledge available, it will be easier to install the ML libs on the Raspberry Pi. In particular, working with ML systems such as TensorFlow or KERAS requires some experience on the RPi, especially when installing components and libraries under Raspberry Pi OS. You therefore should only try to install the ML-specific applications if you are familiar with the basics. If KERAS is to be used for a few short tests only, it is advisable to work with a high-performance Windows or Linux computer first. With Anaconda, the "installation" of KERAS can be carried out by just ticking a box. But, once the first hurdles with PI OS have been overcome and all KERAS components are successfully installed on the Raspberry Pi, KERAS provides a comprehensive base for advanced neural network applications. Deep-learning frameworks such as TensorFlow or Theano can also be integrated into Python quite easily using KERAS. This also enables the efficient construction and subsequent training of more complex neural networks.

● 140

Machine Learning with Python 220224 UK.indd 140

23/03/2022 11:52

Chapter 12 • Recognition of Handwritten Numbers

For the time being, everyone can only hope for improved installation procedures for the RPi. However, if you want to develop compact systems that provide an efficient camera connection or even special hardware controls, you cannot avoid boards like RPi. For this reason, the following paragraphs will briefly describe how to install the relevant libs such as TensorFlow 2.2, KERAS, and Open-CV on the Raspberry Pi. The installation of TensorFlow is quite time-consuming because the simple instruction pip install tensorflow currently only installs version 1.14. For full compatibility with KERAS and openCV, however, (at least) version 2.2 is required. The command sequence for the complete installation can be found in the download package. In this way, the instructions can be copied directly into the terminal. However, it should be noted that the corresponding procedure is subject to rapid changes. In the event of errors, online information corresponding to KERAS and TensorFlow may be helpful. On the other hand, KERAS itself can be installed simply by using the pip tool, like this: pip install KERAS The same applies to OpenCV: sudo pip install opencv-contrib-python If you want to work with live images again, you have to install the PiCamera with NumPy optimizations. This is done via the instruction pip install "picamera [array]" When all libraries have been successfully installed on the Pi, it is recommended to create a backup of the SD card. In this case, a working system image is still available if unexpected problems with the subsequent projects occur. An important advantage of KERAS is the possibility to export data of trained networks and store them externally. This makes it possible to train neural-network systems quickly and efficiently on a powerful computer system. In this way, the use of GPUs (see section 12.2) will be possible, too. After the training phase is complete, the data is transferred to a lowcost single-board computer such as the Raspberry Pi or a MaixDuino, where it is available for new, specific applications. In the following chapter, a so-called deep convolutional neural network is trained to recognize the handwritten digits. The data from the learning phase is used to read in and recognize digits once again via the Pi camera. TensorFlow is accessed via KERAS. However, in order to keep the already complex project from becoming overcomplicated, object localization was not implemented. The images of the digits must therefore be placed directly in

● 141

Machine Learning with Python 220224 UK.indd 141

23/03/2022 11:52

Machine Learning with Python

front of the camera as well as in order, for identification by the network.

12.6 Convolutional networks Convolution is a special mathematical operation that enables the similarity of two functions to be determined. If the two functions under consideration are very different, their convolution integral is almost zero. If there is a certain similarity, the convolution integral takes on a higher value. This property can also be used to good effect in neural networks. The resulting convolutional networks (or convNets for short) therefore only partly consist of neurons with trainable weights. Another essential component are network segments with preset values. One variant here is to set up the ConvNet structures in such a way that they are optimized for input data consisting of images. This allows certain properties to be firmly anchored in the architecture. Through the convolution function it can be determined in advance whether an image contains elements that resemble an arc of a circle, for example, or whether rather angular objects such as triangles or squares predominate. This makes image evaluation more efficient and the number of parameters required in the network can be significantly reduced. In contrast to simple neural networks, ConvNets can process inputs in the form of data matrices. Images displayed as a matrix (image width × image height × number of color channels) can be used directly as input data. A classic neural network in the form of a multi-layer perceptron only accepts one-dimensional vectors as input format. In order to be able to use images as input data, the individual image pixels must be available in a long data series, as in the numerical values in the Iris flower data set from Chapter 11. Because of this restriction, conventional neural networks are for example not able to recognize objects regardless of their orientation in the picture. After a slight shift or rotation, identical objects would have completely different input vectors. In contrast to this, convolution networks are able to recognize structures in the input data. Filters in the input layers are optimized to recognize simple geometric structures such as lines, edges or colored areas, etc. The filters are automatically adjusted by the network. In the next level, more complex elements such as rectangles, circles or ellipses etc. are identified. The level of abstraction of the network increases with each filter level. Ultimately, even entire objects such as tables and chairs or vehicles can be recognized. After appropriate optimization, it is even possible to recognize and identify people or animals. Further details are given in more detail in subsequent chapters. Which abstractions ultimately lead to the activation of the higher layers, depends on the characteristic features of the given classes. For a more precise recording of the functionality of a convolution network, it can therefore be very interesting to visualize the patterns which lead to the activation of the filters on different levels. The MNIST data set can also be used to train a Convolutional Network. The associated program MNIST_keras_Train_1V0.py

● 142

Machine Learning with Python 220224 UK.indd 142

23/03/2022 11:52

Chapter 12 • Recognition of Handwritten Numbers

is once again included in the download package. The network topology can be specified by the instruction model.summary () and spawns the following output: Model: "sequential" _________________________________________________________________ Layer (type)

Output Shape

Param #

================================================================= conv2d (Conv2D)

(None, 26, 26, 32)

320

_________________________________________________________________ max_pooling2d (MaxPooling2D)

(None, 13, 13, 32)

0

_________________________________________________________________ conv2d_1 (Conv2D)

(None, 11, 11, 64)

18496

_________________________________________________________________ max_pooling2d_1 (MaxPooling2

(None, 5, 5, 64)

0

_________________________________________________________________ flatten (Flatten)

(None, 1600)

0

_________________________________________________________________ dropout (Dropout)

(None, 1600)

0

_________________________________________________________________ dense (Dense)

(None, 10)

16010

================================================================= Total params: 34,826 Trainable params: 34,826 Non-trainable params: 0

The model printout shows that the network already consists of more than 34,000 trainable parameters. Using graphviz, a clear graphic representation can be created (see Figure 12.6). The module (see Section 10.8 or LINKS.txt in the download package) leads to the following graphical representation of the ConvNet:

● 143

Machine Learning with Python 220224 UK.indd 143

23/03/2022 11:52

Machine Learning with Python

Figure 12.6: Structure of a convolutional neural network. The graphic once again shows that the network expects data in the first layer in the form of images with 28 × 28 = 784 pixels in black and white, i.e., each with a single bit: input: (?, 28, 28, 1) By contrast, the last network level consists of a fully connected ("dense") layer. It provides 10 output values: output: (?, 10) which are each assigned to one of the 10 digits from 0 to 9. The entry "None", or the question mark in the graphic, is a placeholder indicating the network is able to process more than one sample at the same time. An input form (28, 28, 1) indicates that the network could only process a single image simultaneously. However, KERAS networks can handle processing (see "batch size" in the training phase) with selectable batch size. This means that several images can be used for training at the same time. It should be noted, that although large stacks lead to shorter training times, they also require larger amounts of available memory. As opposed to the Iris model in the previous chapter, training and application can be separated very easily with KERAS. Therefore, the network outlined above can be used for training on a powerful computer (preferably with a GPU). Then the calculated weights of the network can be transferred and used to identify digits in a live camera feed on a different system. Figure 12.7 shows the corresponding set-up for the Raspberry Pi and a connected PiCam:

● 144

Machine Learning with Python 220224 UK.indd 144

23/03/2022 11:52

Chapter 12 • Recognition of Handwritten Numbers

Figure 12.7: Setup for live digit recognition using an RPi and a PiCam. A screenshot shows the details:

Figure 12.8: Live digit recognition with the PiCam. The unprocessed live image from the PiCam is displayed in the upper right window. the window below it shows the preprocessed and inverted black and white image. The Thonny console delivers the evaluated result: ================================= [[0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]] reading digit as:

8

=================================

A 1 in the eighth position of the output matrix corresponds to a recognized "8".

● 145

Machine Learning with Python 220224 UK.indd 145

23/03/2022 11:52

Machine Learning with Python

12.7 Power training For a quick and effective training, the Python file MNIST_keras_Train_1V0.py should be executed on a computer that's as powerful as possible. Using KERAS, a deep neural network model is constructed, compiled, and finally trained in this program. After completion of the training and validation phases, the weights of the network are saved as an external file. The following table shows the training times for different computer systems. Periods 1 12

Raspberry Pi 4 (8 GB) 10 min 2h

Dual-Core (2.8 GHz)

Quad-Core (3.6 GHz)

3 min

70 s

4h

15 min

Twelve epochs run for about a quarter of an hour, even on a high-performance PC. By using an NVIDIA graphics card and including the full GPU (Graphics Processing Unit) power, the training time could be significantly reduced to a few minutes. Therefore, if your computer provides the appropriate resources, do not forget to activate them. All you need to do is install the associated version of TensorFlow and the executable CUDA file from NVIDIA. Further details can be found in the respective manuals for the graphics card used. The calculated weights of the network are saved in an *.h5 file. The file can be copied to the Raspberry Pi and executed there in order to recognize the digits in the live video stream. For copying, say, a USB stick or Filezilla can be used (Section 6.3). In principle, the training could also be carried out on the RPi itself. Then, however, training times of several hours are required. The download package contains some .h5 files that can be used for initial tests on your Raspberry Pi.

12.8 Quality control — an absolute must! Just as in the production of food or electronic components, quality assessment is of crucial importance when using machine-learning systems. In particular, complex systems such as autonomous vehicles or medical technology applications require sophisticated test procedures. For this reason, extensive parameters are available for evaluating the quality of a trained neural network. Of particular importance are the parameters like prediction accuracy and the loss function. The lower the "loss" value of a network, the better the corresponding model was trained. The loss is calculated for training and validation and indicates how well the model performs on these two sets. In contrast to "accuracy", loss is not a percentage value. It is a summary of the errors made for each sample in the training or validation set. The main goal while training a model is usually to reduce or minimize the value of the loss function.

● 146

Machine Learning with Python 220224 UK.indd 146

23/03/2022 11:52

Chapter 12 • Recognition of Handwritten Numbers

However, there are a few issues to keep in mind when reducing the loss value. For example, the problem of "over-fitting" can arise. In this case the model "remembers" the training examples so precisely that higher error values occur once again when using the different samples of the test set. Over-fitting also occurs in cases when a very complex model with an unreasonably high number of free parameters is used. Here, the parameters are also adjusted so precisely to the training set, that they deliver worse results for other values. The accuracy of a model is determined after the model parameters have been optimized and established and the learning process is finalized. Then the test samples are fed into the model and the number of errors is related to all results. If, for example, 1000 results are determined and the model classifies 987 of them correctly, the accuracy of the model is 98.7%.

Figure 12.9: Accuracy and loss curves for a KERAS training session. Figure 12.9 shows two typical learning curves. As expected, the accuracy function increases over the course of the training. The loss values decrease accordingly. In addition, it can be seen that the values finally converge to a certain saturation level. If this point is reached, even longer training times fail to improve network performance.

12.9 Recognizing live images If the *.h5 file has been saved successfully, the network can be tested. Both handwritten and printed digits can be used for this. How good the forecast accuracy actually is, depends on various factors. In particular, the image illumination, the camera viewing angle or the image sharpness and resolution play an important role. Of course, it also depends on how legible the digits actually are. Just like a human observer, the network can also recognize clearly-written digits much more easily. The Python pro-

● 147

Machine Learning with Python 220224 UK.indd 147

23/03/2022 11:52

Machine Learning with Python

gram MNIST_keras_PiCam_live_1V0.py is again included in the download package and can be started on the Raspberry via Thonny. After loading the parameter set, a live camera image is displayed. When the appropriate handwritten digits are within the field of view of the camera, a picture can be taken using the "a" key (for analyze). Before the data are forwarded to the neural network, various transformation steps, such as the conversion into gray scale values or a color inversion, are carried out. Finally, the image data are displayed, and a preprocessed black-and-white version of the data array is shown beside the original live video image. This serves as an additional checking option for the image quality.

Figure 12.10: Correctly recognized "8". After the neural network has calculated the most likely value of the digit, it is printed as a 1-in-10 vector and additionally as the digit value itself to the console (Figure 12.10). The following listing shows the complete Python code for digit recognition: #!/usr/bin/env python print("importing libs...") from skimage import img_as_ubyte from skimage.color import rgb2gray import cv2, imutils, time from imutils.video import VideoStream from keras.models import load_model print("load model...") model=load_model('MNIST_trained_model_RasPi401_30_epochs__001.h5') model.summary() vs=VideoStream(usePiCamera=["picamera"]).start() time.sleep(1.0)

● 148

Machine Learning with Python 220224 UK.indd 148

23/03/2022 11:52

Chapter 12 • Recognition of Handwritten Numbers

while True: frame=vs.read() frame=imutils.resize(frame, width=400) cv2.imshow("press 'a' to analyze - 'q' to quit", frame) key=cv2.waitKey(1)&0xFF if key == ord("a"): img_gray=rgb2gray(frame) img_gray_u8=img_as_ubyte(img_gray) (thresh, img_bw)=cv2.threshold(img_gray_u8,128,255,cv2.THRESH_ BINARY|cv2.THRESH_OTSU) img_resized=cv2.resize(img_bw,(28,28)) img_gray_invert=255-img_resized cv2.imshow("gray", img_gray_invert) img_final=img_gray_invert.reshape(1,28,28,1) ans = model.predict(img_final) print(ans) ans = ans[0].tolist().index(max(ans[0].tolist())) print('reading digit as: ',ans) print("=================================") if key == ord("q"): break cv2.destroyAllWindows() vs.stop() print("bye...")

First, the required libraries are loaded again. Then the pretrained model from the *.h5 file must be included: model=load_model('MNIST_trained_model_RasPi401_30_epochs__001.h5'). Finally, the live video stream of the PiCam can be started: vs=VideoStream(usePiCamera=["picamera"]).start() The main loop continuously displays the currently recorded video image in a separate window. If the "a" key is pressed, the program derives an array of floating point numbers from the original color image. Additionally, a gray-scale image is generated from these data. Finally, the floating point format of the image is converted to an 8-bit number with values in the range of 0 to 255. OpenCV is then used to generate a threshold value. The library offers extensive possibilities for such image processing. In this case, the "Otsu method" was used. The method is named after Nobuyuki Otsu and performs a comprehensive image analysis. It returns an automatically calculated intensity threshold value, which divides the pixels into two classes. This allows the image to be broken down into an essential foreground and a less important background.

● 149

Machine Learning with Python 220224 UK.indd 149

23/03/2022 11:52

Machine Learning with Python

Since the original MNIST set contains white numbers on a dark background the image is color-inverted, as usually a dark pen is used to write on white paper. Inversion can be omitted if the images to be analyzed also consist of light digits on a dark background. As soon as the preprocessing of the image has been completed, the data is forwarded to the pretrained convolutional network. Here, the actual calculations for predicting the digit captured by the camera are carried out As Figure 12.10 shows, a probability array is printed in addition to the prediction value itself. This again provides the probability for each of the ten digit classes from 0 to 9. The value with the highest probability is finally selected and presented as the prediction value. In the example in Figure 12.10, the statement is unambiguous. The probability vector is: Digit Probability

0 1 2 3 4 5 6 7 8 9 (0. 0. 0. 0. 0. 0. 0. 0. 1. 0.)

and thus shows a clear "8". In other cases, the respective values are given with at least one decimal place and the result is calculated as the maximum value: ans = ans[0].tolist().index(max(ans[0].tolist()))

12.10 Batch sizes and epochs In the previous sections, two hyper-parameters were used several times: 1. Batch size 2. Number of epochs In both cases, these are whole-number values that play an essential role in the training phase of a neural network. Nevertheless, the two parameters should be clearly distinguished. A data record generally consists of several, usually even a huge amount of data lines. Each line corresponds to a sample or measurement, for example, a special flower specimen, or, with the MNIST number set, a picture of a handwritten digit. The line of data is also known as a sample or input vector. The batch size defines the number of data lines or samples to be processed and considered in a single training run. At the end of a batch run, the predictions are compared with the expected outputs and an error is calculated. Based on this error value, an algorithm improves the model performance. Ideally, the number of rows of data can be divided by the batch size without a remainder. For example, if 1000 rows of data are available, 10 batches with a batch size of 100 each can be created. This is the best constellation for most algorithms. If this is not the case, the last batch contains fewer samples than the others. However, unequal batch sizes often lead to a slowdown in the training process. It may then be more beneficial to remove some samples from the data set or to change the batch size so that the number of samples in the data set can be divided by the batch size without remainder.

● 150

Machine Learning with Python 220224 UK.indd 150

23/03/2022 11:52

Chapter 12 • Recognition of Handwritten Numbers

The number of epochs, on the other hand, determines how often the learning algorithm runs through the entire training data set. When an epoch is finalized, each sample has contributed once to the learning process. An epoch can therefore use one or more batches. In order to train complex networks, the number of epochs usually has to be very large. Values of 1000 or 10000 and more are not uncommon. Plotting the number of epochs along the x-axis and the network errors along the y-axis, leads to the learning curves already presented in Sections 10.6 and 11.7. In summary, this means: • The batch size is a number of samples that will be processed before the model is updated. • The number of epochs is the number of complete passes through the training data set. • The size of a batch must be less than or equal to the number of samples in the training data set. Again, there are no general rules for configuring these parameters. The optimum values can usually only be found via testing and experience. For example, using • a data set with 100 samples (data lines); • a batch size of 5 and; • 1,000 epochs, means that the data record is divided into 100/5 = 20 batches with five samples each. The model weights are updated after each batch, i.e., after each five samples. Therefore, an epoch includes 20 updates of the model. The model runs through the entire data set 1,000 times. This corresponds to a total of 1000 * 20 = 20,000 updates during the entire training process. In general, the training of a neural network is faster, if a larger the batch size is used. However, larger batches also require larger RAM memory. Here, depending on the application, you have to find an optimum between the available memory size and processing speed.

12.11 MaixDuino also reads digits The MaixDuino board, the smallest system considered here, is able capable of reading handwritten digits. The big advantage of the MaixDuino is that it comes with a camera in its standard setup. Together with the read-available display, a compact number-reading system can be created. For this purpose, a fully trained model (MNIST.kfpkg) can be loaded onto the MaixDuino using the kflash tool:

● 151

Machine Learning with Python 220224 UK.indd 151

23/03/2022 11:52

Machine Learning with Python

Figure 12.11: Loading the MNIST model using Kflash. The Internet link to the MNIST model can be found again in the download package. Here, only the mnist.kfpkg file contained in the ZIP file is required (Figure 12.11). After the model has been loaded, the associated Python program (MNIST_2_KPU.py) can be started in Thonny. import sensor,lcd,image import KPU as kpu lcd.init() sensor.reset() sensor.set_pixformat(sensor.RGB565) sensor.set_framesize(sensor.QVGA) sensor.set_windowing((224, 224)) sensor.set_hmirror(0) task = kpu.load(0x200000) #task = kpu.load("/sd/MNIST.kmodel") print(task) sensor.run(1) while True: img=sensor.snapshot() lcd.display(img,oft=(0,0)) img1=img.to_grayscale(1)

● 152

Machine Learning with Python 220224 UK.indd 152

23/03/2022 11:52

Chapter 12 • Recognition of Handwritten Numbers

img2=img1.resize(28,28) a=img2.invert() a=img2.strech_char(1) lcd.display(img2,oft=(260,30)) a=img2.pix_to_ai(); fmap=kpu.forward(task,img2) plist=fmap[:] pmax=max(plist) max_index=plist.index(pmax) img = image.Image(size=(30, 60)) img.draw_string(0,0, "%d" %max_index, scale=5) lcd.display(img, oft=(260,100)) lcd.draw_string(240,180,"p = %.2f" %pmax,lcd.WHITE,lcd.BLACK)

Alternatively, you can copy the model (MNIST.kmodel in the ZIP file) directly to the SD card of the MaixDuino. In that case, the model can be accessed by typing task = kpu.load("/sd/MNIST.kmodel")

The corresponding line must be activated by deleting the comment character (#) while the line above must be deactivated accordingly. After starting the program, the live image of the camera module is displayed. In addition, a preprocessed control image appears in the upper left corner of the screen. The recognized digit is displayed below the control picture. In addition, the probability for the correct recognition (P) of the respective digit is displayed.

Figure 12.12: MaixDuino reads a digit.

● 153

Machine Learning with Python 220224 UK.indd 153

23/03/2022 11:52

Machine Learning with Python

Chapter 13 • How Machines Learn to See: Object Recognition The development of autonomous cars, trains and airplanes is considered to be one of the most important projects of the next few years, both technically and economically. Various driver-less rail vehicles are already in use at airports around the world. Autonomous cars have been tested in everyday traffic for several years. For the smooth and safe operation of self-driving cars or trucks, the reliable detection of people, lanes, other vehicles or traffic signs has the highest priority. But not only for this reason the recognition and identification of objects became one of the most important research areas in artificial intelligence and machine learning. Object recognition and localization is also of fundamental importance in robotics and industrial automation [12]. This chapter first explains the step-by-step setup of TensorFlow Lite on the Raspberry Pi. This program system allows, among other things, the implementation of object recognition systems, both on static images and in a video stream. Subsequently, object recognition models are executed and tested using the Raspberry Pi with the PiCam connected. The TensorFlow system can also be trained to recognize clothes. The methods already known from the previous chapters can be reused for this task. This bridges the gap between simple pattern recognition (geometric figures, digits, etc.) and the classification of real objects. Finally, the compact MaixDuino system is used to classify 20 everyday objects such as bicycles, cars, chairs or sofas. Together with the voice output discussed in the following chapter, an object recognition system even allows the construction of a "speaking eye". This device is able to acoustically describe optically detected objects. For example, if a car was detected in the camera's field of view, this information is acoustically announced via "I see a car." Such systems can, for example, make life much easier for visually impaired people.

13.1 TensorFlow for Raspberry Pi TensorFlow was used in various applications in earlier chapters. In particular, KERAS uses this system. A light version of TensorFlow has also been available for some time. This is particularly suitable for "embedded systems" such as the Raspberry Pi. The computing power of a Raspberry Pi 4 with 8 GB RAM is sufficient to carry out ML-based object recognition almost in real time. TFlite (TensorFlow lite) basically consists of a series of tools that enable the efficient application of machine learning methods on small and mobile devices [6]. TFLite models run much faster on the Raspberry Pi 4 than classic TensorFlow versions. The installation of TensorFlow Lite on the Raspberry Pi is also much easier than the corresponding procedure for general TensorFlow packages in Linux. The following steps are recommended or required for the successful use of TensorFlow Lite:

● 154

Machine Learning with Python 220224 UK.indd 154

23/03/2022 11:52

Chapter 13 • How Machines Learn to See: Object Recognition

• • • • •

update the Raspberry Pi OS; create a virtual environment and loading all required data into it; actual installation including all associated branches; set up and train the desired model; run the newly created detection system.

Updating the Pi OS is not always necessary. With older systems, however, you should use the commands: sudo apt-get update sudo apt-get dist-upgrade

in order to obtain a current operational base. Depending on the initial software version of the Raspberry Pi, the update process can be completed in a few seconds or it can take a long time, like up to an hour or more). The required database can be loaded from a GitHub repository. This provides all the necessary software components for TensorFlow Lite. The installation is carried out using a suitable script. Loading takes place via a clone instruction which is specified under "Clone TensorFlow Lite" in the LINKS.txt file in the download package. This will download all the necessary files to a folder called "TensorFlow-Lite-Objecdetection ..." on the Raspberry Pi. This folder should be renamed to "TFlite": mv TensorFlow-Lite-Object-Detection-on-Android-and-Raspberry-Pi TFlite

After changing to this new directory via: cd TFlite

all further commands will be executed in the /home/pi/TFlite directory.

13.2 Virtual environments in action The basic concept of virtual environments was already presented in Section 10.10. To create a specific virtual environment called "TFlite-env", the appropriate package is required: sudo pip3 install virtualenv Subsequently a folder called "Tflite" is created. After switching into this folder, the actual environment called "TFlite-env", is created using python3 -m venv TFlite-env

● 155

Machine Learning with Python 220224 UK.indd 155

23/03/2022 11:52

Machine Learning with Python

This will also automatically create a new folder named tflite-env within the TFlite directory. All libraries relevant for this environment are to be installed in the new TFlite-env folder. Eventually the virtual environment will be activated using source TFlite-env/bin/activate

This instruction has to be executed in the directory /home/pi/tflite whenever the environment is to be reactivated. The prefix "TFlite-env" in front of the path name in the command prompt shows that the TFlite environment is actually active:

Figure 13.1: The occurrence of (Tflite-env) indicates the virtual environment. The TensorFlow and OpenCV libraries are now installed in this new virtual environment. OpenCV is only required for taking pictures with the PiCam and displaying them in separate windows. All required instructions can be copied directly from the file install_TFlite_openCV.txt in the download package and subsequently be transferred to the terminal window of the Raspberry Pi. The installation includes a data volume of approx. 0.5 GB, causing long download times. Finally, the complete package installation can be checked using the pip freeze command.

Figure 13.2: The installation of the libraries is completed. As Figure 13.2 shows, only the newly installed libraries "tflite" and "openCV" are actually visible in the virtual environment. Numpy only appears here because it got installed by default via the tflite installation procedure.

● 156

Machine Learning with Python 220224 UK.indd 156

23/03/2022 11:52

Chapter 13 • How Machines Learn to See: Object Recognition

13.3 Using a Universal TFlite Model The preliminary work is now completed and the actual recognition model can be built. In principle, it would already be possible at this point to train your own model. However, this involves considerable effort, as the examples in Chapter 16 show. For this reason, a pre-trained TFLite example model provided by Google is used here. Two files are required for this model: 1. the model itself as *.tflite file. 2. the table of object names ("label map") as a labelmap.txt file. The network was trained with the MSCOCO dataset (MicroSoft Common Objects in Context) and transferred to TensorFlow Lite. The model is able to recognize 80 different objects such as • • • • • •

people and animals; plants and fruits; office and computer accessories; vehicles; furniture, such as tables and chairs; dishes.

The object names are completely contained in the labelmap.txt file, so that this file can serve as a list of all detectable objects. If the objects are to be named, for example, with their French- or German-language equivalents, just use the French or German expressions in the label map. The link to download the model including the label map can be found in LINKS.txt (mobilenet model). The instruction: unzip coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip -d Sample_TFlite_model

automatically creates the destination folder named "Sample_TFLite_model" and saves the unpacked model there. After connection the PiCam (see section 12.4), the setup is ready for use. But first, the network should be tested in detail. The corresponding Python code can be found in the download package as Tflite_detection_PiCam_1V1.py. The program must be copied into the directory /home/pi/Tflite

● 157

Machine Learning with Python 220224 UK.indd 157

23/03/2022 11:52

Machine Learning with Python

After switching to this directory (cd Tflite) the virtual environment is activated via source TFlite-env/bin/activate

The object recognition itself is started via python3 TFlite_detection_PiCam_1V1.py

The program requires considerable computing power and extensive memory space. Therefore, the use of a Raspberry Pi 4 with 8 GB RAM is recommended. Smaller or older Pi versions will probably have problems processing the code. In addition, all unnecessary applications on the Raspberry Pi should be closed to free up as much memory and computing power as possible. Starting the program takes a few seconds, even on the Pi 4, as extensive initializations are required. Finally, a window with the live image of the connected camera is displayed. As soon as an object has been detected, it is marked by a light green bounding box. In addition, the name of the object appears, including the probability (in percent) with which it was classified (Figure 13.3).

Figure 13.3: Object detection using TFlite The image shows that the recognition of technical objects works quite well. Different types of fruit are also recognized without any problems:

● 158

Machine Learning with Python 220224 UK.indd 158

23/03/2022 11:52

Chapter 13 • How Machines Learn to See: Object Recognition

Figure 13.4: The algorithm also recognizes fruit. The fact that the tangerines are labeled as oranges is mainly caused by the fact that this type of fruit was not trained separately and is therefore not included in the model and the label map. Depending on the amount of information contained in the images, up to 20 images per second can be analyzed. This means that real-time acquisition is already within reach. To further increase the frame rate, AI accelerator sticks such as the Movidius or Coral stick could be used on the USB port of the Raspberry Pi. For simple video surveillance applications, however, the processing speed achievable with the RPi alone is often sufficient. More extensive tests show that the classification is not only based on relative shapes, sizes or colors. For example, yellow apples are recognized as such and not as bananas. Model cars are also correctly identified as vehicles, just like their larger counterparts. The fact that the system does not always yield impeccable results is demonstrated by two obviously incorrect classifications shown in Figure13.5. These erroneous decisions occur when objects have quite similar features and one of the actual objects is not available as a separate category. In this case, the network often puts the objects into the category that seems most likely.

Figure 13.5: Nothing and no one is perfect!

● 159

Machine Learning with Python 220224 UK.indd 159

23/03/2022 11:52

Machine Learning with Python

13.4 Ideal for sloths: clothes-sorting Some people have the tendency not to dispose of items, specifically clothing, even if the piece in question is beyond repair. This can lead to the so-called "sloth syndrome", i.e., compulsive hoarding, when the excessive accumulation of more or less worthless objects in one's home becomes a serious problem. The project presented in this section will not be able to eliminate these difficulties completely. However, a first step in avoiding absolute chaos could be sorting useless clothing. The following system can certainly do a good job here. For this purpose, a neural network model is trained that allows trousers, shirts or even bags to be classified. In a way, the model closes the gap from the simple number recognition of Chapter 12 to the identification of objects in image data. The so-called Fashion MNIST data set poses a much more challenging problem than the classical MNIST data set of handwritten digits. In particular, the comparison of both datasets and the associated algorithms offers deeper insights into how ML systems work. The classification again requires KERAS, NumPy and MatPlotLib (Section 12.5). The implementation can be carried out in a Jupyter notebook on a PC or on a Raspberry Pi. First, the proper installation of KERAS in should be checked: import tensorflow as tf import numpy as np import matplotlib.pyplot as plt print(tf.__version__)

The fashion classifier runs from version 2.2. Once the installation has been verified, the data record can be loaded: fashion_mnist = tf.keras.datasets.fashion_mnist (train_images, train_labels), (test_images, test_labels) = fashion_mnist. load_data()

The data should be loaded in less than a minute. Four arrays are available: • train_images and • test_images and

train_labels test_labels

as a training set as test data

Similar to the handwriting data set, the images consist of 28 × 28 pixel NumPy arrays. The individual pixels contain gray-scale values in the range from 0 to 255. Whole numbers between 0 and 9 are assigned to the labels. The numbers correspond to the clothing class of the respective picture:

● 160

Machine Learning with Python 220224 UK.indd 160

23/03/2022 11:52

Chapter 13 • How Machines Learn to See: Object Recognition

Number

Class

0

T-shirt / Top

1

Trousers

2

Pullover

3

Dress

4

Coat / jacket

5

Sandal

6

Shirt

7

Sports shoes

8

Bag

9

Ankle Boot

Since the class names are not included in the data set, they have to be stored in a separate array: class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

Of course, other languages, for example German, could be used here if necessary: class_names = ['T-shirt/Top', 'Hose', 'Pullover', 'Kleid', 'Mantel/ Jacke','Sandale', 'Hemd', 'Sportschuh', 'Tasche', 'Stifelette']

By using plt.figure() plt.imshow(train_images[3], cmap=plt.cm.binary) plt.colorbar() plt.grid(False) plt.show()

the individual images appear in gray-scale format, for example, a dress:

● 161

Machine Learning with Python 220224 UK.indd 161

23/03/2022 11:52

Machine Learning with Python

Figure 13.6: Example image of a "dress" Before being fed into the neural network, the pixel values in the training and test data set are scaled as usual to a range from 0 to 1: train_images = train_images / 255.0 test_images = test_images / 255.0

To check the data format, you can look at some images from the training set. Since the class names are already stored in the class-names array, these can also be displayed for each image: plt.figure(figsize=(12,12)) for i in range(12): plt.subplot(4,4,i+1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(train_images[i], cmap=plt.cm.binary) plt.xlabel(class_names[train_labels[i]]) plt.show()

Fig 13.7 shows the result:

● 162

Machine Learning with Python 220224 UK.indd 162

23/03/2022 11:52

Chapter 13 • How Machines Learn to See: Object Recognition

Figure 13.7: Various clothing pieces.

13.5 Construction and training of the model The following model is used for sorting clothes: model = tf.keras.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(10) ])

The first level converts the format of the images from a two-dimensional array (28 × 28 pixels) to a one-dimensional array with 28 × 28 = 784 values. The rows of pixels in the images are simply lined up to form a long vector. This first layer does not have any learning parameters to be trained, it is only used for reformatting. Two dense layers follow. The first one has 128 nodes or neurons. The second and final layer returns an array with a length of 10. Each output node contains a score that indicates which of the 10 classes the current image was sorted into. Before the training, the various parameters must be set: • the loss function, which measures how accurately the model is already working during training; • the optimizer; • the metrics to monitor the training and testing steps. The following values are used in the example: model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])

● 163

Machine Learning with Python 220224 UK.indd 163

23/03/2022 11:52

Machine Learning with Python

With the usual steps: 1. feedi the training data (train_images and train_labels) into the model; 2. execute the actual training process; 3. make predictions based on test series (test_images). the training of the model is carried out via the "fit" function with a certain number of epochs: model.fit(train_images, train_labels, epochs=10)

The model achieves an accuracy of approx. 0.9, or 90%, after about 10 epochs. The performance of the model is then checked with the test data set: test_loss, test_acc = model.evaluate(test_images,

test_labels, verbose=2)

print('\nTest accuracy:', test_acc)

The test accuracy is usually less than 90%. This difference between training and test accuracy again indicates overfitting. When the model is trained, predictions can be made on selected images. The outputs of the model, in the form of a one-hot vector, can be converted into probabilities via a softmax function: probability_model = tf.keras.Sequential([model,, tf.keras.layers.Softmax()]) predictions = probability_model.predict(test_images)

For picture no. 10 (sample=10), for example, the result is: predictions[10]

array([7.4302930e-05, 9.1672379e-01, 1.3121125e-07,

2.0597602e-06, 4.4173207e-09, 7.5306295e-09],

4.5739021e-02, 5.9046588e-06, 3.7454229e-02, 2.3479103e-09, dtype=float32)

The ten values of the array represent the model's "confidence" that the image corresponds to one of the 10 different garments. The highest confidence value (9.17e-1 = 0.917 = 91.7%) results from: np.argmax(predictions[sample])

to be "4", i.e., a "coat or jacket". So, the model is quite sure that this picture is a coat or jacket (class_names[4]). According to the test label, this classification is correct:

● 164

Machine Learning with Python 220224 UK.indd 164

23/03/2022 11:52

Chapter 13 • How Machines Learn to See: Object Recognition

test_labels[10]

returns the value 4 (coat/jacket). The visual inspection of picture no. 10 also confirms that it is a coat or jacket:

Figure 13.8: Coat / jacket correctly recognized! The inspection of further images shows that the prediction performance of the network is already quite good:

Figure 13.9: Classification of clothing. The net mostly delivers correct predictions with probabilities of over 90%. Only about 15% of the shirts could also be T-shirts. An error that could certainly also befall a human being at this image resolution. The example shows that KERAS is not only able to recognize abstract figures such as numbers relatively easily. Even everyday objects such as clothes can be successfully categorized with the help of a KERAS-based neural network. Camera images could also be directly evaluated here again. In cooperation with a robotic arm, the project might even be extended to the extent that the arm automatically sorts laundry after the individual items of clothing have been recognized by the ML algorithm.

● 165

Machine Learning with Python 220224 UK.indd 165

23/03/2022 11:52

Machine Learning with Python

13.6 MaixDuino recognizes 20 objects For the MaixDuino, a model is available with a capability of distinguishing 20 different objects. The 20class.kmodel offers a ready-to-use classifier that recognizes everyday objects, vehicles or animals. The 20 classes are, in alphabetical order, airplane bicycle bird boat bottle bus car cat chair cow dining table dog horse motorbike person potted plant sheep sofa train monitor or TV

The download link for the model can once again be found in the LINKS.txt file. The model is available as kmodel and as 20class.kfpkg file and can therefore be loaded onto the MaixDuino via Flash-Tool or using the Thonny IDE. The associated program (20class.py) can then be used to identify these objects: import sensor,image,lcd,time import KPU as kpu lcd.init(freq=15000000) sensor.reset() sensor.set_pixformat(sensor.RGB565) sensor.set_framesize(sensor.QVGA) sensor.set_vflip(0) sensor.run(1)

● 166

Machine Learning with Python 220224 UK.indd 166

23/03/2022 11:52

Chapter 13 • How Machines Learn to See: Object Recognition

clock = time.clock() classes = ['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'] # task = kpu.load(0x500000) task=kpu.load("/sd/20class.kmodel") anchor = (1.889, 2.5245, 2.9465, 3.94056, 3.99987, 5.3658, 5.155437, 6.92275, 6.718375, 9.01025) a = kpu.init_yolo2(task, 0.5, 0.3, 5, anchor) while(True): clock.tick() img = sensor.snapshot() code = kpu.run_yolo2(task, img) # print(clock.fps()) if code: for i in code: a=img.draw_rectangle(i.rect()) a = lcd.display(img) print(classes[i.classid()]) for i in code: lcd.draw_string(i.x(), i.y(), classes[i.classid()], lcd.RED, lcd.WHITE) lcd.draw_string(i.x(), i.y()+12, '%f1.3'%i.value(), lcd.RED, lcd.WHITE) else: a = lcd.display(img) a = kpu.deinit(task)

The libraries called sensor, image, and lcd ensure smooth operation of the camera ("Image Sensor") on the LCD screen. With "KPU" the Knowledge Processing Unit is integrated (Chapter 7.1). The camera is initialized with the default values. Using sensor.set_vflip(0)

the image orientation can be set. The standard orientation is selected with the parameter "0". Then, the object classes are defined: classes = ['aeroplane', 'bicycle', 'bird', 'boat', …]

When loading the model, you can choose between direct loading from the memory at address 50000_hex, using task = kpu.load(0x500000)

● 167

Machine Learning with Python 220224 UK.indd 167

23/03/2022 11:52

Machine Learning with Python

or from a separate file: task=kpu.load("/sd/20class.kmodel")

The anchor points: anchor = (1.889, 2.5245, 2.9465, 3.94056, 3.99987, 5.3658, 5.155437, 6.92275, 6.718375, 9.01025)

specify the parameters for the search function. Among other things, the size of the search window is defined here. The default values should be retained here. Further details on this can be found in Section 16.5. After taking a picture, img = sensor.snapshot()

an analysis is carried out using the loaded model: code = kpu.run_yolo2(task, img)

If a known object is detected, its coordinates and other information are stored in the variable "code". Finally, based on the data in "code", a frame is drawn around this object, and the associated class as well as the recognition probability (0–1) are indicated: lcd.draw_string(i.x(), i.y(), classes[i.classid()], lcd.RED, lcd.WHITE) lcd.draw_string(i.x(), i.y()+12, '%f1.3'%i.value(), lcd.RED, lcd.WHITE)

The following image shows, for example, the recognition of a medicine bottle as a "bottle" and a model car as a "car".

Figure 13.10: Bottle and car recognized!

● 168

Machine Learning with Python 220224 UK.indd 168

23/03/2022 11:52

Chapter 13 • How Machines Learn to See: Object Recognition

13.7 Recognizing, counting and sorting objects With the detection and marking of a recognized object on a display, the AI task is basically solved. This is also where most PC-based projects come to an end. With a Raspberry Pi or the MaixDuino, however, you can go one essential step further. In addition to an HDMI or display output, these embedded systems offer a number of readily available I/O pins. Based on the ML results, actuators such as servos, robot arms or even entire robots can be controlled [12]. In this section, a very simple actuator in the form of an LED will to be controlled. The LED needs to be connected to port "13" of the MaixDuino via a 220-ohm series resistor. The following image shows the circuit:

Figure 13.11: Circuit diagram for the bottle detector. The aim of a simple bottle detector is to light up the red LED whenever a bottle is detected. This is the basis for the well-known deposit machines in beverage markets and discounters. In addition to the barcode, these machines also evaluate the shape of the returned item in order to exclude fraud as far as possible. Instead of the LED, a relay can be used that enables the bottle to be accepted — or not. On the software side, the program from the last section can be extended accordingly (bottle_detector.py): import sensor,image,lcd,time import KPU as kpu from Maix import GPIO from fpioa_manager import fm

● 169

Machine Learning with Python 220224 UK.indd 169

23/03/2022 11:52

Machine Learning with Python

from board import board_info fm.register(3, fm.fpioa.GPIO0)

# silk Arduino P13

led_r=GPIO(GPIO.GPIO0,GPIO.OUT) for i in range(10): led_r.value(1) time.sleep(.1) led_r.value(0) time.sleep(.1) lcd.init(freq=15000000) sensor.reset() sensor.set_pixformat(sensor.RGB565) sensor.set_framesize(sensor.QVGA) sensor.set_vflip(0) sensor.run(1) clock = time.clock() classes = ['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'] # task = kpu.load(0x500000) task=kpu.load("/sd/models/20class.kmodel") anchor = (1.889, 2.5245, 2.9465, 3.94056, 3.99987, 5.3658, 5.155437, 6.92275, 6.718375, 9.01025) a = kpu.init_yolo2(task, 0.5, 0.3, 5, anchor) while(True): clock.tick() img = sensor.snapshot() code = kpu.run_yolo2(task, img) # print(clock.fps()) led_r.value(0) if code: for i in code: a=img.draw_rectangle(i.rect()) a = lcd.display(img) print(classes[i.classid()]) if (classes[i.classid()] == "bottle"): print("DETECTED") led_r.value(1) for i in code: lcd.draw_string(i.x(), i.y(), classes[i.classid()], lcd.RED, lcd.WHITE) lcd.draw_string(i.x(), i.y()+12, '%f1.3'%i.value(), lcd.RED, lcd.WHITE) else:

● 170

Machine Learning with Python 220224 UK.indd 170

23/03/2022 11:52

Chapter 13 • How Machines Learn to See: Object Recognition

a = lcd.display(img) a = kpu.deinit(task)

The lines fm.register(3, fm.fpioa.GPIO0)

# silk Arduino P13

led_r=GPIO(GPIO.GPIO0,GPIO.OUT)

activate port 13 as an output. When the program starts, the connected LED should first flash 10 times in rapid succession as a functional check. for i in range(10): led_r.value(1) time.sleep(.1) led_r.value(0) time.sleep(.1)

Then, the established program part for object recognition follows. Additionally, the lines if (classes[i.classid()] == "bottle"): print("DETECTED") led_r.value(1)

were inserted in the main loop. They ensure that the LED is activated as soon as a bottle is detected. In addition, the word DETECTED is displayed in the console. To make sure that the LED is not permanently lit, the line led_r.value(0)

was also added.

● 171

Machine Learning with Python 220224 UK.indd 171

23/03/2022 11:52

Machine Learning with Python

Chapter 14 • Machines Learn to Listen and Speak Besides the eyes, the ears are among the most important sensory organs for human beings. Acoustic information contributes significantly to the perception of the environment. In principle, audio technology is much easier to handle than image and video processing. Historically, radio technology was available long before the first television broadcasts. Electronic recording and playback of sounds via vinyl records and tapes were also developed and commercialized long before the first video recorders. In computer technology, things were somewhat different. Right from the beginning, computers had some form of screen. So, visual output was always in the foreground. Sound signals usually played a subordinate role. Computers with speech output were a much bigger challenge and properly working systems were not available until recently. For a long time, primitive voice computer communication was reserved for films and TV series such as Star Trek, War Games, or Knight Rider. It was not until the turn of the millennium that voice-capable devices became available, although slowly. The first classic telephone voice systems at public authorities, banks and insurance companies were eventually joined by voice assistants such as Siri and Alexa. In the meantime, e-book readers such as Amazon's Echo are able to read Kindle books aloud in almost natural language. The latest generation of laptops or tablets explain their functions using synthetic speech when they are first used. At the same time, the formerly widespread speech chips have become obsolete mainly because standard processors are now completely sufficient to build speaking systems [5]. Thus, the Raspberry Pi is also able to communicate via voice instructions. Both speech playback and the recognition of spoken information are possible. Although speech systems are not so clearly in the focus of current machine-learning development, they are certainly also among the desirable functionalities of artificial intelligence. Particularly in robotics, communication in natural language plays an increasingly important role. This chapter will therefore take a closer look at the possibilities of speech output and recognition, specifically on the Raspberry Pi. In combination with other technologies, such as various sensors or the object recognition already discussed, very impressive and practically applicable projects can be implemented using speech communication.

14.1 Talk to me! The first devices and systems with a rudimentary capability of generating human speech were constructed over 200 years ago. Modern speech synthesis can therefore look back on a long history. With simple components such as a bellows as a lungs, tubes, and wooden resonators, scientists eager to experiment began to synthesize word elements. Amazingly, it was already possible at that time to produce sounds that had certain similarities with human speech. A big step towards speech synthesis as we know it today was taken with the first digital speech synthesizers in the 1960s when analog electronic components started to supersede the purely mechanical devices from the old days. However, at first it was hardly possible

● 172

Machine Learning with Python 220224 UK.indd 172

23/03/2022 11:52

Chapter 14 • Machines Learn to Listen and Speak

to produce intelligible speech with these devices. The electronically generated waveforms had little to do with real language. Later on, digital text-to-speech synthesis (TTS) brought some progress. For the first time, speech elements were constructed from a given number of phonemes and word elements. Long after the first sound recordings, fully synthesized words and eventually entire sentences could finally be reproduced intelligibly over audio systems. The first "talking machines" were soon used in automated telephone information systems. The recognition and decoding of human speech, on the other hand, is once again much more complex than its synthesis. Even when talking machines were already widespread, speech recognition was still in its infancy. Despite intensive efforts, the breakthrough was a long time coming. With elaborate mathematical procedures such as Fourier transformations or decomposition of words into phonetic units, only minimal success could be achieved. In particular the problem that different speakers and voices have very different acoustic signatures, could not be solved for a long time. Again and again, acoustic queries were met with, "I didn't get you..." Complete speech systems thus remained fiction for a long time. Similar to image processing, self-learning systems led to the breakthrough. With the help of neural networks, speech recognition software could finally be trained to capture human speech and convert it to text or instructions. After pre-filtering and digitization, the sounds are fed into appropriately trained networks. Now that the training could be automated, it was no longer a problem to use hundreds of speakers, i.e., men or women and children as well as senior citizens as "language teachers". This ultimately solved the problem of voice variation between different speakers and opened the door to various speech recognition applications. Thus, voice-assisted data entry, call forwarding and routing, automatic voice dialing, voice search, and voice control became ubiquitous. Banks and insurance companies replaced their call center staff with computerized voice systems. So-called "telephone banking" led to massive savings in the financial sector. The direct dictation of spoken words into the computer and machine translations replaced the typical tasks of secretaries and interpreters. Deep-learning methods make it possible to perform high-quality word and speech recognition. The software systems use so-called Natural Language Processing (NLP) to break down speech into easily interpretable elements. Pre-trained models provide precise word classifications and interpretations in real time. In this way, the spoken word is converted into text or even complex instructions. However, the mere recognition of sound signals is not sufficient for comprehensive speech recognition. Speech recognition software must, for example, also detect the difference between synonyms and homophones or distinguish between proper names and regular words. For example, the word "newton" can denote the unit of mechanical force as well as one of the greatest physicists in history.

● 173

Machine Learning with Python 220224 UK.indd 173

23/03/2022 11:52

Machine Learning with Python

Speech recognition must therefore provide a process that is, to a certain extent, intelligent and capable of linking knowledge sources and linguistic information. This is where even the most modern systems reach their limits. However, the "Watson" AI system provided an impressive example of natural speech understanding in 2011. In a US TV quiz show, Watson competed with human opponents and easily defeated the previous human champions. The rules of the game stated that the contestant who first presses a switch and is able to answer correctly, is the winner. The questions are formulated in such a way that not only general knowledge, but also the recognition of complex correlations and mental associations as well as irony and linguistic wit play an important role. The system proved that it understands natural language and can also answer difficult questions quickly and correctly. Nevertheless, the task of recognizing spoken words remains significantly more difficult than generating speech electronically. The models used must capture phonetic units and then identify specific phonemes or sounds in a database. Finally, a language model is required that recognizes meaningful words and sentences and interprets them correctly. Only the combination of these procedures ultimately provides useful results. Current language models are by no means unchanging or static. Just as a language itself changes, speech recognition must also be continuously adapted. By learning new terms and phonemes, speech recognition can eventually understand indistinct pronunciations or dialects, just like a toddler is learning the native tongue. Speech recognition software is thus determined in the long term by the quality of the machine learning algorithms [5]. This is one of the reasons why it is hardly possible to implement high-quality speech recognition systems completely autonomously on small systems such as a Raspberry Pi or the MaixDuino. The following chapters therefore use a web interface provided by Google for this purpose.

14.2 RPi Learns to talk Since the Raspberry Pi does not have an audio input, it cannot be used for speech recognition without additional hardware. However, voice output is certainly within the realms of possibility. For example, headphones, active speakers or other audio output systems can be connected to the 3.5-mm jack socket of the Raspberry Pi. Alternatively, a small audio amplifier can be built [5, 14]. The audio output is activated via the Rpi-config menu and the corresponding: Advanced Options

Audio

The best results are achieved with the Forced 3.5 mm ('headphone') Jack setting. However, it turns out that the audio system under Pi OS is quite complex. In addition, the sound output does not work properly with some operating system versions or on different board variants. In these cases, the optimal sound setting must be found out

● 174

Machine Learning with Python 220224 UK.indd 174

23/03/2022 11:52

Chapter 14 • Machines Learn to Listen and Speak

experimentally. Basically, the parameter snd-usb-audio in /etc/modprobe.d/alsa-base.conf must be adapted from –2 to 0. When testing different settings, the sound output can be checked using the instruction speaker-test -t wav -c 2 If successful, "front left" should be heard on the left channel and "front right" on the right channel. If this is the case, the sound output via the audio socket is set correctly. With this, you are well prepared for setting up a Raspberry-based speech system. The well-known eSpeak text-to-speech system has been widely used, especially on Linux systems. It should therefore form a good basis for a port to RPi OS. However, the system had not been further developed for several years. Moreover, it required an extensive installation procedure and some special adaptations to ensure a reasonably smooth operation. With eSpeak-NG, on the other hand, a new software system is available that is largely compatible with the current Raspberry Pi OS. NG stands for Next Generation. This version can be installed very easily. The corresponding installation instructions can again be taken directly from the LINKS.txt file in the download package. There is also a Python interface that can be loaded onto the Pi via PIP3 install (see LINKS.txt). ESpeak-NG has several language modules. Besides English, German and French are also available for example. With the eSpeak_NG_tst.py from the download package, the speech quality of the system can be tested: from time import sleep from espeakng import ESpeakNG phrase1_en="Hello, how are you today?" phrase1_de="Guten morgen Doktor!" phrase2_de="Ist der Fluxkompensator bereit?" print(phrase1_en) esng=ESpeakNG(voice='en') esng.pitch = 5 esng.speed = 120 esng.say(phrase1_en, sync=True) pause(1) print(phrase1_de); print(phrase2_de); esng=ESpeakNG(voice='de') esng.pitch = 10

● 175

Machine Learning with Python 220224 UK.indd 175

23/03/2022 11:52

Machine Learning with Python

esng.speed = 150 esng.say(phrase1_de, sync=True) esng.say(phrase2_de, sync=True)

After starting the program, the sentences Hello, how are you today? in English and Guten Morgen Doktor! Ist der Fluxkompensator bereit? can be heard in German. Although speech does not have a perfectly natural sound, the intelligibility is sufficient for many applications. For the output of simple information such as temperatures or measured values for example, the system is quite usable. For a comparison with your own system, the sound sample in the download package (eSpeak_NG_tst_1V1.mp3) can be used. If the sound of your system is distorted, you can try to optimize the settings on the volume controls. Usually, the best results are achieved when both the system settings of the RPi and the controls of the external sound system are set to medium values. Alternative speech output systems are capable of achieving much better results. For example, online applications usually achieve much higher speech qualities. Since internet- or cloud-based text-to-speech services are able to use high computing power for a short period of time, they come closest to natural-sounding speech. Speech intelligibility could also be further improved with additional hardware components. But corresponding boards are very expensive and can easily reach or even exceed the purchase price of a Raspberry. The following chapters will therefore explore the possibilities of eSpeak-NG. In addition to all the programs presented, some sound sample files are also included in the download package. In this way, you can get an impression of the speech quality before setting up the system.

14.3 Talking instruments ESpeak-NG is able to read out any short text available in electronic form. Typical examples are stock market news or weather reports. For longer texts, however, the speech quality still takes some getting used to, so you should switch to the available alternatives like eBook readers or similar applications. For shorter texts, on the other hand, the quality is quite acceptable. Applications that only need to provide relatively simple and compact information are therefore a good choice. Among other things, "talking" measuring devices fall into this category.

● 176

Machine Learning with Python 220224 UK.indd 176

23/03/2022 11:52

Chapter 14 • Machines Learn to Listen and Speak

In the service sector, for example, multimeters are used that read out current measured values in natural language. In this way, a technician can direct his gaze to the circuit board or the control cabinet while the device reads out the measured voltage value. The acoustic output of time, current temperatures, or air pressure values can also offer practical advantages. For example, a talking climate station is able to inform its owner about the current measured values without having to look at a display. As a first simple application, the well-known DHT11 or DHT22 can be used as the active sensor element. Apart from a 10 kilo-ohm resistor (color code: brown-black-orange), no other electronic components are needed to connect this temperature and humidity sensor to the Raspberry Pi. Figure 14.1 shows the corresponding circuit.

Figure 14.1: Talking climate station using a DHT11 sensor. A suitable library can be downloaded from the internet via a GitHub repository (see LINKS. txt). First switch to a newly created sub directory: cd Adafruit_Python_DHT There, the installation can be started using sudo apt-get install build-essential python-dev The command: sudo python3 setup.py install completes the installation. The setup can be tested with the following program (see DHT11_tst.py in the download package):

● 177

Machine Learning with Python 220224 UK.indd 177

23/03/2022 11:52

Machine Learning with Python

from time import sleep import Adafruit_DHT pin='4'

# connection for sensor data pin

sensor=Adafruit_DHT.DHT11 while True: humidity, temperature = Adafruit_DHT.read_retry(sensor, pin) if humidity is not None and temperature is not None: print('Temp = {0:0.0f} °C - Humidity = {1:0.0f} %'.format(temperature, humidity)) else: print('Failed to get reading. Try again!') sleep(1)

If the sensor is connected correctly, the measured values for the current temperature and humidity are displayed in the Thonny console. In the next step, the values are to be output acoustically with eSpeak-NG. However, the library can only process words and letters. Therefore, a conversion is required for the output of numbers. For this task, the "num2words" library is available. It converts numbers into number words (427 "four hundred and twenty seven"). The installation is done with a pip3 install instruction (see LINKS.txt). After starting the program eSpeak_NG_climate_station.py: from time import sleep from espeakng import ESpeakNG from num2words import num2words import Adafruit_DHT pin='4'

# connection for sensor data pin

sensor = Adafruit_DHT.DHT11 esng=ESpeakNG(voice='en') esng.pitch = 25 esng.speed = 180 esng.say while True: humidity, temperature = Adafruit_DHT.read_retry(sensor, pin) print('Temp = {0:0.0f} °C - Humidity = {1:0.0f} %'.format(temperature, humidity)) esng.say("room temperature is",sync=True) count = num2words(temperature) esng.say(count,sync=True)

● 178

Machine Learning with Python 220224 UK.indd 178

23/03/2022 11:52

Chapter 14 • Machines Learn to Listen and Speak

esng.say(" degrees centigrade",sync=True) esng.say("humidity is",sync=True) count = num2words(humidity) esng.say(count,sync=True) esng.say(" %", sync=True) sleep(5)

the current values for temperature and humidity can be heard from a connected headphone or an active loudspeaker at intervals of approx. 5 seconds. A demo audio example of the output is included in the file DHT11_1V0.mp3. The continuous announcement of values is not really ideal in the long run. A pushbutton query which outputs the values only when needed could solve this problem. However, the use of AI methods offers an even more elegant solution. Using speech recognition, the system can be extended so that the climate station only reads out values when it is actively asked for them. Section 14.4 presents a ChatBot that makes use of this possibility.

14.4 Sorry, didn't get you ... In their early days, voice recognition systems had a bad reputation. In telephone banking, many entries were acknowledged with the terse reply "Sorry, I didn't get you...". Since many customers were annoyed by this in the end, some credit institutions then even advertised with the slogan "With us, you still talk to real people!". Over the years, however, the systems quickly improved. In the meantime, simple orders, appointment requests and the like can be handled without problems via voice systems, not only at banks and public authorities but also in many other lines of business. If you want to set up a speech recognition system with the RPi, you are initially faced with the problem that the board does not have an audio input. So, you have to resort to external solutions. Either a USB sound card with an external microphone or a webcam with an integrated microphone can be used (see Figures 14.2 or 10.15).

Figure 14.2: USB sound card connected to the Raspberry Pi.

● 179

Machine Learning with Python 220224 UK.indd 179

23/03/2022 11:52

Machine Learning with Python

External USB sound cards usually also provide better sound quality than the direct output of the Raspberry Pi. However, there are considerable differences here, so you should inform yourself about the quality to be expected before buying a specific system. It should also be noted that not all USB sound cards are compatible with the Raspberry Pi. If a webcam with integrated microphone is used instead of the sound card, the camera is also available to other AI projects such as object and face recognition, etc. Both solutions require no further configuration on the RPi. If the devices are compatible with the Pi OS currently in use, they are automatically detected and integrated into the system (Figure 14.3). The instruction lsusb returns a list of all connected USB devices. If the webcam or the sound card are plugged in, they appear in the list.

Figure 14.3: Sound cards or webcams detected as USB devices. Two new "sound devices" were detected in Figure 14.3: 1. an external USB sound card (Logilink); 2. aUSB WebCam (UOA101). If the devices are not listed, a reboot usually solves the problem. With the so-called alsamixer you can check the correct function of the audio components. The corresponding console instruction: alsamixer starts the graphic interface for the sound settings. Here, the recording and playback volume can be set. The desired sound device is selected via the [F6] key. The settings can be changed using the arrow keys. The recording volume can also be checked with the help of the alsamixer and adjusted if necessary. For this purpose, the up/down arrow keys can also be used.

● 180

Machine Learning with Python 220224 UK.indd 180

23/03/2022 11:52

Chapter 14 • Machines Learn to Listen and Speak

Figure 14.4: Sound Level Adjustment using the Alsamixer Using the instruction arecord -l you can once again create a list of the existing sound components. USB sound cards or webcams should also appear here. The command arecord -D plughw: 1,0 test.wav starts sound recording. The recording is saved in the file test.wav. The playback via headphones or active boxes is done via the command aplay test.wav If the recorded sound is audible, the microphone and sound output are working properly. Otherwise, settings and volume adjustment should be checked again. With the sound card or the camera microphone, a sound input is now available that can also serve as the basis for a speech recognition system. As shown in the last sections, a well understandable speech output can be built without any problems using the local resources of a Raspberry Pi. However, for efficient speech recognition, even the resources of a Pi4 with 8 GB are still insufficient. At this point, online services cannot be avoided. One possibility is the use of the "SpeechRecognition" module provided by Google. This can be installed via a series of apt-get instructions. Via the file SpeechRecognition.txt in the download package, the commands can be transferred directly into the terminal using copy-and-paste. The complete installation takes about one hour. For the Speech-Recognition application, the user's voice commands are first captured by the microphone and then digitized by the sound card. In the Google API, the phonemes are converted into machine-readable text. In this way, a corresponding program can be used as part of an interactive voice response system. With suitable Python code, the Raspberry

● 181

Machine Learning with Python 220224 UK.indd 181

23/03/2022 11:52

Machine Learning with Python

can finally react to given instructions via voice command. Figure 14.5 illustrates the basic structure of the ChatBot created in this way:

Figure 14.5: Speech Recognition on the Raspberry Pi. The corresponding code (ChatBot_DHT11_1V2.py) is again included in the download package. In the following, the essential parts of the program will be discussed in more detail. First, some keywords are defined: greetings=['hello', 'hi', 'hey'] questions=['how are you', 'how are you doing'] responses=['okay', 'i am fine'] database={ 'hello Robert':'hello, dr Falken, how can i help you', 'name':'Robert', 'what is your name':'my name is Robert', 'what can you do for me':'i can do many things..', 'I love you':'i love you too' } exitWord=['quit','exit','bye','goodbye']

Then some predefined answers are included. To prevent the dialogues from becoming too monotonous, a certain flexibility has been integrated. The question "How are you?" can be answered with "okay" or "I am fine". The audio signal is received from the microphone using audio = r.listen(source)

● 182

Machine Learning with Python 220224 UK.indd 182

23/03/2022 11:52

Chapter 14 • Machines Learn to Listen and Speak

With m as source. Next, the following statement is used value = r.recognise_google(audio) to search for intelligible speech in the audio signal. If recognizable words or whole sentences are found, they are converted into text. This information is subsequently available in the variable "data" for further processing. The variable can be searched for keywords. Individual subroutines such as elif 'temperature' in data: print("You said: %s" % reply) humidity, temperature = DHT.read_retry(sensorType, humiturePin) print("The room temperature is: ", temperature, end=""); print(" °C") number=num2words(temperature) esng.say("the room temperature is", sync=True) esng.say(number, sync=True) esng.say(phrase3, sync=True)

or

elif 'CPU temperature' in data: print("You said: %s" % reply) tempCPU=getCPUtemperature() print("CPU temperature: ", tempCPU, end=""); print(" °C") number=num2words(tempCPU) esng.say("the CPU temperature is", sync=True) esng.say(number, sync=True) esng.say(phrase3, sync=True)

react accordingly to different instructions. The first routine acoustically spells out the room temperature detected by the connected DHT11 sensor. In the second case, the ChatBot delivers the RPi's current CPU temperature.

14.5 RPi as a ChatBot As already mentioned, the Raspberry Pi differs from a PC or laptop in providing readily programmable I/O pins. This makes the board ideal for home automation applications. In this respect, the RPi is clearly superior to a classic computer. The library for controlling the hardware pins is included using the instruction pip install Rpi.GPIO Together with the speech recognition and output modules, this library provides a powerful basis for controlling external hardware components. From a simple LED to small motor

● 183

Machine Learning with Python 220224 UK.indd 183

23/03/2022 11:52

Machine Learning with Python

or servos, to a relay-controlled air-conditioning system, there are almost no limits for the possible applications. Simple keyword queries such as elif 'light on'in data in data: print("turning light on") esng.say("turning light on", sync=True) GPIO.output(LEDPin,True)

can be used to switch and control lighting, fans or cooling units using the I/O port specified. The program in the download package can evaluate the following instructions, among others: Query light on / off fan on / off time room temperature humidity CPU temperature

Action switches pin 23 or the LED connected there on or off. controls a fan connected to pin 24. reads out the current time reads out the room temperature via DHT11 sensor reads out the air humidity via DHT11 sensor reads out the CPU temperature

The program also contains some answers that could bring special joy, not only to children. For example, the system also responds to the query "what does the cat say?" In this context, it is also interesting how precisely the speech recognition distinguishes between "dog" and "duck". Of course, the system can easily be extended to other instructions. The download also contains a link to a video that demonstrates the basic application of a voice-controlled home automation system. The following figure shows how the DHT11 sensor, a LED or a small motor can be connected to the RPi.

● 184

Machine Learning with Python 220224 UK.indd 184

23/03/2022 11:52

Chapter 14 • Machines Learn to Listen and Speak

Figure 14.6: Circuit for a Home Automation ChatBot The entire system can be easily built on a breadboard. The resistors have the following values: - orange - orange- orange - brown - black - orange - red - red - brown

33 kiloohm 10 kiloohm 220 ohm

The complete system looks like the version in Figure 14.7.

● 185

Machine Learning with Python 220224 UK.indd 185

23/03/2022 11:52

Machine Learning with Python

Figure 14.7: Setup proposal for a chatbot. Further information on the components used here can be found in Chapter 18.

14.6 From ELIZA to ChatterBots Starting in the early days of AI research, attempts were made to transfer the mental abilities of humans to machines in order to solve certain problems. The goal was to make the comprehensive in-depth knowledge of highly qualified "experts" as universally accessible as possible. For example, the diagnosis of infectious diseases was to be automated in order to relieve highly trained medical doctors from routine tasks. However, it quickly became apparent that the knowledge representation of even a narrowly defined field quickly became extremely complex. Rule-based diagnoses quickly required decision trees that were no longer manageable. For example, the simple flu symptoms like cough, hoarseness and aching limbs can indicate a multitude of different diseases. Experienced doctors can quickly rule out unlikely diagnoses, whereas an expert system has to ask dozens of questions to do so. The maintenance of these systems also became a problem: the systems were not adaptive! Innovative approaches and better computer systems therefore led to the development of machine learning in this area, among others.

● 186

Machine Learning with Python 220224 UK.indd 186

23/03/2022 11:52

Chapter 14 • Machines Learn to Listen and Speak

Simple "expert systems" nevertheless have their justification. They can still be used successfully in narrowly defined areas. As shown in the last chapter, a few instructions are sufficient for a home automation system to adjust lighting or temperature conditions in a living room. Therefore, chatbots are often used in special dialogue systems. Corresponding applications can be found in the most diverse areas such as -

customer service, order taking or support; entertainment, finance, games; health care; human resources and marketing; news, social, sports and travel agencies.

The progressive improvement of technology eventually formed the basis for the voice recognition systems Google Assistant, Siri, Cortana, or Alexa. Simple systems search directly for keywords in the input. More advanced methods perform extensive analyses of the text spoken by the user before searching for a complex suitable answer or action. This already allows a very reliable speech recognition to be built up. Nevertheless, it can be seen, for example when working with the Raspberry Pi ChatBot, that the reliability is still significantly worse than with direct human-to-human communication. The forefather of all chatbots was ELIZA, an early computer program for natural language processing. Of course, at that time, in the mid-1960s, communication still took place via keyboards and screens. ELIZA was supposed to imitate the typical responses of a psychotherapist. The conversation was kept going with the help of "pattern matching" and substitution methods. Simple random elements were also included. One of the most interesting results of this experiment was that many people revealed their most intimate thoughts and problems unfiltered, without the chatbot having even the slightest idea of the conversant's problems. A typical "conversation" with Eliza looked something like this: ELIZA >>> Hello my name is ELIZA Person: Can I ask you something? ELIZA >>> Did you come because you have a question? Person: Yes ELIZA >>> Are you all right? Person: Yes, thank you for asking ELIZA >>> That's nice! Person: What are you doing today? ELIZA >>> We are supposed to talk about you, not about me. Person: I have problems with my mother ELIZA >>> Why do you have problems with your mother? ...

● 187

Machine Learning with Python 220224 UK.indd 187

23/03/2022 11:52

Machine Learning with Python

Practically no information about human thinking or about emotions was used, yet the program often delivered amazingly human-ish answers. Since those early days, chatbots have often been perceived as creepy or Big Brother-like. Nevertheless, households around the world are increasingly using digital voice assistants such as Alexa or Google Home. Interestingly, according to various surveys, about 25% of users would not miss a conversation with a real human being if an intelligent ChatBot system answered their questions in an acceptable manner — and the trend is rising! ChatBots will undoubtedly appear in more and more applications in the near future. Most experts even assume that in a few years it will be practically impossible for many people to distinguish whether they are talking to a human being or a ChatBot. To conclude this chapter, we will now present two applications that show that it is possible to implement quite interesting projects with speech output alone.

14.7 The Talking Eye In addition to the applications in home automation, speech output in combination with the object recognition methods from Chapter 13 offer highly interesting practical applications. If the object detection routines are already installed, it is also possible to get the information about detected objects via speech output [6]. To do this, simply connect an active speaker to the 3.5 mm audio socket of the Raspberry Pi. Figure 14.8 shows a circuit diagram for an overall setup, which already includes the optional ultrasound extension used in the next chapter.

Figure 14.8: Setup for object detection with voice output. On the software side, the object recognition program just has to be extended by the speech output. For this purpose, the eSpeak module already known from the previous section is used again. By inserting the lines

● 188

Machine Learning with Python 220224 UK.indd 188

23/03/2022 11:52

Chapter 14 • Machines Learn to Listen and Speak

os.system("espeak 'I see a'") speak = str(object_name) os.popen('espeak "' + speak + '" --stdout | aplay 2> /dev/null').read()

after the object detection code, the individual objects are identified one after the other via the voice output. The entire program is available in the download package as TFLite_detection_PiCam_speaker_1V0.py Of course, the program must be started again in the virtual environment created for object detection purposes. The resulting "speaking eye" can be an important everyday aid for the blind or visually impaired, for example. Such systems are also increasingly being used on a commercial level to support people with corresponding disabilities. Of course, highly integrated versions are usually used here, so that the devices are much more compact than the design presented here. For example, it is already possible to integrate the entire system into a pair of glasses, which hardly differ from an ordinary version. Nevertheless, the system contains a camera, the object recognition module and an audio output in the earpiece of the glasses. However, simple object recognition is by no means the end of the story. For example, miniature computers that can be worn like classic glasses are available, providing an integrated optical display in addition to the acoustic output. The display is mounted on the frame of the glasses at the edge of the field of vision. This setup can be used to show general information and additional data on the objects detected by the camera. One of the first commercially available devices in this field was the Google Glass system. This is without doubt a highly innovative application project for AI processes. However, the device also very clearly shows the dangers of the new technology. Of particular importance is the threat to the privacy of persons observed. The glasses are definitely capable of inconspicuously spying on the wearer's surroundings and wirelessly transmitting recordings of all sorts to a server. You can only imagine the possibilities that such devices offer to those in power in a totalitarian surveillance state...

14.8 An AI Bat Even with simple means, you can go one step further and equip the system with an ultrasonic range finder. This makes it even superior to its human counterpart in a certain sense. Of course, humans are also able to estimate distances based on their experience and stereoscopic vision with two eyes. However, an ultrasound module can detect distances up to a few meters with centimeter-accuracy (see Section 13 and list of references). With regard to the ultrasonic range finder, it should be noted that it must be designed for operation at 3.3 V. Alternatively, a conventional transducer for 5 V can be used. In this case, however, an additional 3.3 V / 5 V level converter is needed in the trigger/echo signal lines, as the I/O pins of the Raspberry Pi permit 3.3 V only. Figure 14.8 can be used for the setup, as the connection of the ultrasonic sensor is already shown there. Further informa-

● 189

Machine Learning with Python 220224 UK.indd 189

23/03/2022 11:52

Machine Learning with Python

tion on the ultrasound module itself and the connection to the Raspberry can be found in Section 18.7. The program TFLite_detection_webcam_distance_speaker_1V0.py contains the additional instructions for evaluating the ultrasonic rangefinder. After initializing the pins: TRIG=18 #trigger pin of HC-SR04 ultrasound module ECHO=24 #echo pin of HC-SR04 ultrasound module GPIO.setmode(GPIO.BCM) #or GPIO.setmode(GPIO.BOARD) GPIO.setup(TRIG, GPIO.OUT) GPIO.setup(ECHO, GPIO.IN)

the distance of an object (in centimeters) can be determined using the routine def distance(): GPIO.output(TRIG, 1) time.sleep(0.00001) GPIO.output(TRIG, 0) while GPIO.input(ECHO)==0: start_time=time.time() while GPIO.input(ECHO)==1: stop_time=time.time() return (stop_time-start_time)*340/2*100 #calculate distance from time

Here, the speed of sound in air (340 m/s) is used: return (stop_time-start_time)*340/2*100 #calculate distance from time

The factor 2 results from the fact that the round trip time to and from the object must be considered. Once an object has been detected and recognized, its distance can be indicated acoustically via eSpeak. It should be noted that the object with the largest reflective surface always dominates the distance information. Thus, although small objects in the foreground are correctly recognized by the AI system, the ultrasonic sensor provides the distance of the background object. By means of more sharply focused ultrasound sources, a certain improvement could be achieved here. For principal physical reasons, however, a detailed ultrasonic identification of individual objects is hardly possible. For this reason, laser scanners are predominantly used in autonomous vehicles. These provide an actual three-dimensional image of the surroundings. The distance information no longer has to be obtained additionally, as it is supplied directly by the laser system.

● 190

Machine Learning with Python 220224 UK.indd 190

23/03/2022 11:52

Chapter 14 • Machines Learn to Listen and Speak

For simpler applications, such as in the field of visual aids, a device equipped with low-cost ultrasonic sensors is, nevertheless quite practicable. Applications that combine the advantages of AI-supported image recognition and ultrasonic distance measurement are also being used more and more frequently in automation technology and robotics.

● 191

Machine Learning with Python 220224 UK.indd 191

23/03/2022 11:52

Machine Learning with Python

Chapter 15 • Facial Recognition and Identification Many smartphones, tablets and the like now offer the option of using automatic facial recognition rather than a password to unlock the device. In more and more companies, employees can sit down at their workstation and be identified via webcams for access to the internal network. This procedure provides complete access to all the resources they need — there is no longer any requirement to enter a password ore use fingerprint scanners or card readers for company ID cards. Comparable functions are also directly integrated in various computer operating systems now. The ability of robots and computers to recognize people or human faces and then interact with the person in question holds a special fascination for many people [12]. In this chapter, a face recognition system will be built using a Raspberry Pi and the PiCam or a USB webcam. It is important to distinguish between two different concepts: • Face detection; • Face identification (FaceID). In the first case, human faces are just detected in the field of vision of a camera. The faces can be automatically tagged or counted. In addition to faces, the images may also include landscapes, buildings and other parts of the human being, such as legs, shoulders and arms, etc. The limitations of the first recognition systems, which could only recognize faces in front of uniform backgrounds, are obsolete. The first face recognition systems were used in electronic autofocus cameras. Here, just the recognition of a human face within the field of view of the camera lens is required. In a second step, the camera can automatically focus on the face. It is not necessary to identify the person. Other applications include determining the number of people in a certain area. In addition to security-related tasks, marketing perspectives also may come into focus here. Facial identification, on the other hand, is able to determine the identity of a specific person. Originally, access control to sensitive areas was one of the most important applications of this technology. In the meantime, however, the method is also used for everyday tasks, such as unlocking laptops, mobile phones or tablets. This is much more demanding than simply recognizing human faces or people. Moreover, this procedure bears considerable risks for individual privacy. When a modern smartphone is unlocked via FaceID by simply looking into the front camera, artificial intelligence is already at work. If you let Google or Apple sort your holiday photos, the results are based on machine-learning algorithms yet again. Everyone must decide for themselves whether this is a good idea in every respect, as there is no guarantee whatsoever that the biometric data collected in the process will really be processed securely [7].

● 192

Machine Learning with Python 220224 UK.indd 192

23/03/2022 11:52

Chapter 15 • Facial Recognition and Identification

Despite all the concerns about privacy and data protection, facial recognition has become one of the most important applications of artificial intelligence. Due to its high economic importance and its various applications, it has been intensively studied in recent decades. Facial identification has therefore achieved a high level of reliability. In addition, it is possible to determine exactly the probability, that a certain face matches a person in a database. This opens up further possibilities. One of them is the so-called augmented reality (AR). The term is used for computer-assisted extensions of reality by virtual elements. The technology provides additional information to the human perception, including sensitive data of a certain person, for example. An important feature of AR is real-time data processing. All information is displayed immediately and the user is able to access the content via touchscreen or gesture control. With the success of smartphones, as well as the constant improvement in their performance and camera capabilities, interest in AR continues to grow. Large corporations have been working for years on the mass-market production of devices for mobile augmented reality. The development of special AR devices, such as data glasses (smart glasses), continues at full speed. In the future, there will undoubtedly be more and more powerful smartphones and smart glasses equipped with augmented reality applications. The associated software captures camera images, reads sensors or GPS data and supplements them with information from extensive databases. The visual augmentation is then embedded in the real environment. A typical application of augmented reality technology is shown as an example in Figure 15.1. For each visible person, various data, such as age, profession and nationality, etc., are displayed.

Figure 15.1: Personal identification and augmented reality.

15.1 The right to your own image With the development of image recognition techniques capable of analyzing faces, questions about the safety of this technology quickly arose. Ultimately, you can only assess the risks with sound personal knowledge and comprehensive expertise.

● 193

Machine Learning with Python 220224 UK.indd 193

23/03/2022 11:52

Machine Learning with Python

Since a pure facial recognition system does not compare image data with databases, no personal biometric data is stored. Without data storage, users in principle have less to worry about in terms of data breaches. Facial recognition systems for customer counting, etc., are therefore less critical in terms of data security. Nevertheless, the current legal situation should always be carefully checked before using these devices. Some systems are able to provide general information, such as age or gender, without specifically identifying the person itself. An example of these techniques is presented in Section 15.12. Retailers are thus able to determine which age groups the majority of their customers belong to. Accordingly, they can optimize product selection or marketing strategies. Facial recognition is therefore not completely unacceptable in all cases. In areas with a high security requirements, such as in military facilities or at certain official buildings the technology can be very useful. If specific personal identification is not required, pure facial recognition might be quite uncritical in terms of data protection. Things are different with facial identification. Here, under certain circumstances, the personal rights of the people recorded are deeply interfered with. The mere recognition of a person at a certain location is in principle already a violation of personal rights. If this information is collected in extensive databases, the "transparent person" quickly becomes a reality. The creation of movement profiles or the extraction of typical habits and behaviors is no longer a technical problem. What possibilities this will open up for interested groups of people can hardly be estimated today... Note: In order to draw attention to this problem, the pictures of persons used in this book were alienated with various procedures. Tests of methods, programs, and routines must of course be carried out on original images without alienation.

15.2 Machines recognize people and faces After the first approaches of machine face recognition, it took over 30 years until the first really useful results were achieved with an algorithm called Eigenfaces. Since then, however, the topic has been gaining more and more attention and one can assume a bright future for this field of research. There is no doubt that security applications will continue to play a central role in the future. Both facial recognition and identification can be used here. Many facial recognition algorithms start by searching for human eyes. These form a socalled valley region and are thus one of the easiest features to recognize. Once the eyes are detected, the algorithm typically tries to detect other facial areas such as the eyebrows, mouth, nose, and iris. Once the initial assumption that a facial region has been recognized is confirmed, further tests can be applied. These then check whether a human face has actually been detected or whether it is a random distribution of features that only resemble a human face. Face identification is thus ultimately a biometric procedure that goes far beyond capturing a human face in the field of view of a camera.

● 194

Machine Learning with Python 220224 UK.indd 194

23/03/2022 11:52

Chapter 15 • Facial Recognition and Identification

Airport security systems are an interesting example for person identification. Facial identification is used to automate passenger screening. In this way, people wanted by the police or potential suspects can be identified before they commit a crime, such as hijacking an aircraft. Facial recognition, on the other hand, can be used to record passenger numbers. This allows travelers to be directed in the best possible way. Larger accumulations of people or long queues are automatically and quickly detected in order to be able to open additional counters or check-in facilities, for example. In commercial use, facial recognition can also be used to speed up the identification of people. For example, systems are conceivable that recognize customers as soon as they enter a bank or insurance offices. An employee can already greet the person in question by name and prepare their data before they actually get to the employee's counter or office. Active billboards could adapt their content to the people passing by. After analyzing the persons, advertising spots would adapt to gender, age or even personal style. Here, however, the first conflicts with data protection laws may well arise. In many countries private companies, are not allowed to photograph or film people in public places. With the use of 3-D cameras, the technology can be significantly improved once again. Thanks to the ability to capture three-dimensional images of a face, these systems achieve significantly improved accuracy. Simple camera systems can already be fooled with photographs. This can easily be verified with the PiCam. A simple setup is practically unable to distinguish between photographs and the persons themselves (Section 15.8). Facial recognition will undoubtedly become even more common in the future. The amount of data generated in recent years allows for new ways to analyze the information captured. Machine learning methods will be used to find more ways to make sense of this information. On the other hand, one should never forget the dangers of these technologies. It is therefore advisable that as many people as possible get to grips with the methods in a practical way. Keeping this in mind, it may be possible to prevent certain people from abusing the new powers of AI. In principle, the classic methods of machine learning and object detection can also be used for face recognition [6]. Ultimately, a human face is nothing else but an object with special properties. Instead of the number images in the section of machine based reading of handwriting digits, a large number of faces could be used as a training basis. The internet with its applications such as YouTube or Facebook (!) provides an almost inexhaustible data source. The comprehensive information is also used intensively by the well-known internet corporations, as many subscribers have often even unknowingly given their consent for using their personal data. The advances in facial recognition are thus based in no small part to the boom in social media. The huge amount of data accumulated there contains a myriad of facial images, often even along with a large amount of more or less personal and sensitive information.

● 195

Machine Learning with Python 220224 UK.indd 195

23/03/2022 11:52

Machine Learning with Python

To understand how an algorithm can detect faces, one can first ask how people actually recognize a face. On most (frontal) images of human faces, there are two eyes, a nose, lips, forehead, chin, ears and hair. Even so, these basic components are practically always present, faces also differ from each other. So, the problem is to separate the common features from the individual differences. An additional problem arises from the fact that the face of one and the same person changes constantly with emotions such as laughter, anger, joy or annoyance etc. In addition, age, cosmetics or haircut often have a significant influence. Furthermore, even just changing the viewing angle produces a different facial image. However, a closer look reveals that there are some features in every face that are largely independent of age, emotion and orientation, etc. Various research groups therefore began using unsupervised learning methods to separate and identify different types of faces. Others followed the approach with extensive training of models based on each feature of the face. Different approaches were used such as the -

Principal Component Analyses (PCA); Linear Discrimination Analysis; Independent Component Analysis; Gabor filters.

Gabor filters, commonly used in image processing, can capture important visual features. They are of particular importance because they can localize essential features such as eyes, nose or mouth in an image. Consequently, they are used for both face recognition and identification [7]. The classical method of comparing face images pixel by pixel, on the other hand, is hardly effective. For example, background and hair pixels would provide little information. Moreover, for a direct comparison, all faces in all images would have to be perfectly aligned in order to expect usable results at all. To solve this problem, the PCA algorithm creates a set of principal components called eigenfaces.

● 196

Machine Learning with Python 220224 UK.indd 196

23/03/2022 11:52

Chapter 15 • Facial Recognition and Identification

Figure 15.2: Fundamental and unalterable features of a face From a mathematical point of view, eigenfaces are a graphical representation of so-called eigenvectors or characteristic values of a data matrix. With a sufficiently large data set, any human face could in principle be represented by a combination of eigenfaces, just as in mathematics any vector can be constructed by a combination of basis vectors. Other analytical methods such as linear component analysis also are based on the eigenface algorithm. In Gabor filters, the most important features in the face are captured first. A subsequent eigenface algorithm then uses these as a basis for comparison. The focus is on values such as the relative distance between the eyes or the distance between the ears and the tip of the nose, etc. These values are very specific to a face. Therefor they are also very specific for a particular person and can hardly be changed. The position of the corners of the mouth or the hair line play a less important role, as these values are quite variable. Several open-source libraries for both face recognition and identification, in which one or more of these methods have been implemented are currently available. Figure 15.3 illustrates the steps from face recognition to face identification. For identification, a recognition method is required first. Subsequently, a database can be built using the corresponding image material. By training a special neural network, the essential facial features can then be extracted. These finally serve as the basis for identifying individual persons or faces.

● 197

Machine Learning with Python 220224 UK.indd 197

23/03/2022 11:52

Machine Learning with Python

Figure 15.3: Steps from face recognition to identification.

15.3 MaixDuino as a Door Viewer People or institutions with increased security requirements often want an automatic door viewer or "door spy". This should forward a message or trigger an alarm as soon as any person is within a certain area. Various applications are also conceivable outside of security-relevant areas. For example, in smaller retail shops, the staff could be notified as soon as a person enters the salesroom. These examples are classic applications of facial recognition methods. A face only has to be detected; identification is not necessary. For facial recognition, for the MaixDuino a fully trained network model called facedetect.kmodel is available. In addition, the Maix can once again make use of its advantage as an embedded system. It is not only possible to draw a frame around a person or a face on a display, but also to activate an I/O pin, after it has been detected. This can then be used to switch on an LED or an audible alarm or similar. The corresponding download link is included in LINKS.txt ("MaixDuino Face detection"). in the download package with this book, the Python program can be found as face_detection_simple.py

● 198

Machine Learning with Python 220224 UK.indd 198

23/03/2022 11:52

Chapter 15 • Facial Recognition and Identification

import sensor,image,lcd import KPU as kpu lcd.init() sensor.reset() sensor.set_pixformat(sensor.RGB565) sensor.set_framesize(sensor.QVGA) sensor.run(1) frameColor=(0,255,255) # task = kpu.load(0x300000) # kmodel in flash at adress 0x300000 task=kpu.load("/sd/models/facedetect.kmodel") anchor=(1.889,2.5245,2.9465,3.94056,3.99987,5.3658,5.155437,6.92275,6.718375,9.01025) a=kpu.init_yolo2(task,0.5,0.3,5,anchor) while(True): img=sensor.snapshot() code=kpu.run_yolo2(task,img) if code: for i in code: a=img.draw_rectangle(i.rect(),color=frameColor,scale=10) a=lcd.display(img) a=kpu.deinit(task)

After importing the libraries, display and camera are initialized in the program. Then the model is loaded. Here again, it is possible to switch between direct loading from the memory under the address 30000_hex using task = kpu.load(0x300000) # kmodel in flash at adress 0x300000

or, from a separate file, task = kpu.load(0x300000) # kmodel in flash at adress 0x300000

After setting the anchor points, the model is initialized. In the main loop, the video image is captured: img=sensor.snapshot()

and evaluated via the loaded model: code=kpu.run_yolo2(task,img)

As soon as the variable code contains relevant data, a frame is drawn around the detected face:

● 199

Machine Learning with Python 220224 UK.indd 199

23/03/2022 11:52

Machine Learning with Python

a=img.draw_rectangle(i.rect(),color=frameColor,scale=10)

The content of the variable code is explained in more detail in the following section. After starting the program, each face captured by the MaixDuino camera is surrounded by a frame in the color "frameColor". The line width of the frame can be adjusted via the variable scale.

Figure 15.4: MaixDuino detected a face. Similar to object recognition, the recognition function can also be used to activate further functions.

15.4 How many guest were at the party? One application of the detection functions is the automatic counting of persons. If, for example, three persons were detected, the variable code in the program face_detection_simple.py from the last section shows the following content: [{"x":26, "y":98, "w":47, "h":79, "value":0.905853, "classid":0, "index":0, "objnum":3}, {"x":150, "y":104, "w":38, "h":63, "value":0. 938012, "classid":0, "index":1, "objnum":3}, {"x":251, "y":109, "w":47, "h":63, "value":0.859508, "classid":0, "index":2, "objnum":3}] The array elements x, y, w and h contain the coordinates (x, y) and the width and height (w, h) of the faces. Value indicates the probability that actually a human face is detected. Since only one class is detected here ("face"), the classid is always zero. Also interesting is "objnum". This variable indicates how many faces were found in total. With "index", a number is assigned to each hit. Using code[0].objnum() the number of faces found can be determined. The program face_counter.py thus continuously provides the current number of faces in the field of vision of the MaixDuino cam:

● 200

Machine Learning with Python 220224 UK.indd 200

23/03/2022 11:52

Chapter 15 • Facial Recognition and Identification

Figure 15.5: Face counting. If only numeric values are printed (see face_counter_numeric.py), the plotter function of the Thonny IDE can be used, like so: View

Plotter

With this, it is possible to graphically record the number of faces captured over a certain period of time.

Figure 15.6: Time-dependent face counting using the plotter function.

● 201

Machine Learning with Python 220224 UK.indd 201

23/03/2022 11:52

Machine Learning with Python

The result is a very powerful tool that can be used extensively in security technology or marketing. At airports, for example, passenger numbers can be recorded automatically. Shop owners and retailers can automatically determine the number of customers during their opening hours, and so on. Many other immoral or even illegal applications should not even be mentioned here.

15.5 Person-detection alarm You don't always have to use the display to just show frames around faces. It can be much more interesting, to use also the I/O pins of the MaixDuino. The program face_alarm.py: import sensor,image,lcd import KPU as kpu from Maix import GPIO from fpioa_manager import fm from board import board_info lcd.init() sensor.reset() sensor.set_pixformat(sensor.RGB565) sensor.set_framesize(sensor.QVGA) sensor.run(1) frameColor=(0,255,255) fm.register(3, fm.fpioa.GPIO0) led_r=GPIO(GPIO.GPIO0,GPIO.OUT) # task = kpu.load(0x300000) # kmodel in flash at adress 0x300000 task=kpu.load("/sd/models/facedetect.kmodel") anchor=(1.889,2.5245,2.9465,3.94056,3.99987,5.3658,5.155437,6.92275,6.718375,9.01025) a=kpu.init_yolo2(task,0.5,0.3,5,anchor) while(True): img=sensor.snapshot() code=kpu.run_yolo2(task,img) img = image.Image() img.draw_string(30,10,"No of persons detected:",scale=2) if code: numberOfPersons=code[0].objnum() print(numberOfPersons) img.draw_string(140,120,str(numberOfPersons),scale=5) led_r.value(1) else: print(0)

● 202

Machine Learning with Python 220224 UK.indd 202

23/03/2022 11:52

Chapter 15 • Facial Recognition and Identification

img.draw_string(120,120,"none",scale=5) led_r.value(0) lcd.display(img) a=kpu.deinit(task)

switches an LED connected to port 13 as an alarm signal as soon as at least one face has been detected. The hardware setup for this can be seen in Figure 15.7. The display output looks like this:

Figure 15.7: Two people in the detector zone! In this way, no PC is required and the MaixDuino can be used in stand-alone mode.

15.6 Social minefields? — face identification As already explained in the introduction to this chapter, face identification must be clearly distinguished from pure face recognition. In many parts of the world, face identification in particular is increasingly viewed critically. For example, the use of face recognition technology by police or authorities has already been banned in some cities. Politicians consider the danger of using facial recognition technology to be greater than the potential benefits in the fight against crime. The "uninterrupted surveillance by camera systems threatens the possibility of living free from observation by police, authorities or government," is a statement in a city council resolution of a Californian city. Yet the technology is poorly regulated in many countries. In addition to potential misuse of the systems by dictatorial regimes, unacceptable invasions of privacy could therefore also occur in democratic states. The biometric data collected via facial identification is unique to each person. In addition to faces, however, the eyes alone (retina scan), the course of veins under the skin, human voices, the way of walking or typing on a keyboard can also be used for personal identification. Authorities, governments and even companies all over the world use recording devices to collect biometric data. This means that in principle any person, once recorded, could be tracked in all their movements. When the person accesses office equipment, checks in at an airports, or gets its train tickets checked - facial identification is playing a central role in more and more areas of life. Specifically in the Far East, biometric applications are experi-

● 203

Machine Learning with Python 220224 UK.indd 203

23/03/2022 11:52

Machine Learning with Python

encing a real boom. In the major Asian cities, the new technologies are already changing every-day's life to a considerable extent. For example, employees no longer have to swipe chip cards through a reader when entering their office building. A glance at a dedicated camera is sufficient to identify the employee. In railway stations or airports, automatic recognition systems check passenger tickets against their ID cards. In some Asian cities, entire subway systems are already monitored using facial recognition techniques. The algorithms behind facial identification are trained with huge data sets. Often, proprietary deep learning platforms are used to train large and multi-layered neural networks. However, it does not remain a static system, rather the parameters are constantly adjusted until all desired information is reliably delivered. Automatic person identification should therefore always be viewed with a certain degree of skepticism. For this reason, too, it is of utmost importance to know the potentials and possibilities in this field as precisely as possible. In the following sections, the technique of face identification will be examined in more detail. In addition to the purely technical circumstances, one should also think about the possible areas of application. Because ultimately, only people who have dealt with the new technology, at least to a certain extent, will be able to assess its benefits and dangers.

15.7 Big Brother RPi: face identification in practice The 'openCV' software library, which was already introduced in Section 10.5, is also suitable for applications in the field of face recognition and identification. Among other things, the library contains ML-based algorithms to search for faces in an image. Since faces have a complex structure and are subject to considerable individual variation, there is no simple test that could find a face in any image via an "if" decision. Instead, there is a multitude of patterns and functions that are coordinated with each other. The algorithms divide the problem of identifying a face into several sub-tasks, each of which is easy for a machine algorithm to solve. These associated algorithms are also called "classifiers". For the most efficient image recognition, OpenCV uses so-called cascades. Like a series of waterfalls, the problem is thus divided into several stages. For each image block, a rough but quick test is performed. If one of these tests leads to a positive result, a slightly more detailed test is performed and so on. The algorithm can perform 30 to 50 of these stages or cascades in succession. Only when all stages have been passed with positive results, the recognition of a face is confirmed. A significant advantage of this procedure is that the majority of an image in the first stages only yield negative results. This means that the algorithm does not waste time testing all the functions for each block in detail. This can speed up the process considerably. Face recognition procedures can thus be carried out almost in real time even on less powerful systems such as the RPi. Since face recognition is one of the most important applications of machine learning, OpenCV has a whole series of integrated cascades with which not only faces but also eyes, hands or legs etc. can be recognized. Moreover, it is even possible to interpret the mood

● 204

Machine Learning with Python 220224 UK.indd 204

23/03/2022 11:52

Chapter 15 • Facial Recognition and Identification

of a person by analyzing the shape of the mouth, for example. These principles are also the basis of the well-known procedures in digital cameras, which automatically trigger the shutter as soon as a person's smile is detected in the image. Due to the high efficiency of the OpenCV algorithms, even with the relatively modest computing power of a Raspberry Pi, a quite reliable face recognition can be performed. The following software packages are required for this project: • OpenCV for basic processing of images and videos in real time. The package also includes some machine learning algorithms. • Face-recognition for actual face recognition, the creation of reference frames around faces, etc. • Imutils including a number of handy functions to speed up OpenCV algorithms on the Raspberry Pi After the libraries are installed, the RPi can be trained using a series of face images. The images are automatically compiled into a training data set. Afterwards, the Raspberry can not only detect the known people as faces in a live video stream, but also identify them. However, you should keep in mind that video recordings in public areas are usually illegal. Installing and setting up the packages and libraries necessary for face recognition requires advanced knowledge and experience in using the Raspberry Pi. But even experienced users should plan at least two hours for this task. The exact duration depends, among other things, on the download speed of the available internet connection. Since the library versions in this currently very active area change rapidly, a certain amount of experimentation is necessary here, as it is quite possible that a newer program version is not compatible with the one presented here. As usual, the first step should be sudo apt-get update & sudo apt-get upgrade before starting a major new project. Then the actual installation can be started. The individual instructions for this can be copied from the file "Install_FaceRecognition.txt" directly into the terminal window of the Raspberry. Any prompts or questions ("Would you like to continue? (y/n)") must be acknowledged with the y and the enter key. Figure 15.8 shows the successful installation of version 4.5.1 of openCV on the Raspberry Pi.

● 205

Machine Learning with Python 220224 UK.indd 205

23/03/2022 11:52

Machine Learning with Python

Figure 15.8: Successful installation of Opencv-python Now you can install the face recognition software: sudo pip3 install face-recognition and imutils: sudo pip3 install imutils. It may take another hour or so to download these packages. After they have been successfully installed, practical applications can be explored. The first step in the following chapter is to try to localize faces in the field of vision of a camera.

15.8 Smile, please ;-) The downloaded packages contains both a trainer and a detector for face identification. Here, the focus is on the detector first. OpenCV already contains many pre-trained classifiers, e.g., for - faces; - eyes; - a smile. The download link can be found in the LINKS.txt file (XML-files). The classifier files are stored in a folder, e.g., under face_detection_recognition/HairCascades Then, with just a few lines, a very effective face detection system can be built. The Python program for this (FaceDetector_1V0.py) is again included in the download package and appears to be very compact: import cv2 cascade="HaarCascades/haarcascade_frontalface_default.xml" detector = cv2.CascadeClassifier(cascade)

● 206

Machine Learning with Python 220224 UK.indd 206

23/03/2022 11:52

Chapter 15 • Facial Recognition and Identification

blue=(255,0,0) framewidth=3 cap = cv2.VideoCapture(0) cap.set(3,320) # set Width cap.set(4,240) # set Height while True: ret, img = cap.read() gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) faces = detector. detectMultiScale(gray,scaleFactor=1.1,minNeighbors=5,minSize=(30,30)) for (x,y,w,h) in faces: cv2.rectangle(img,(x,y),(x+w,y+h),blue,framewidth) cv2.imshow("Press 'q' quit",img) key = cv2.waitKey(1) & 0xFF if key == ord("q"): print("bye...") break cap.release() cv2.destroyAllWindows()

After loading the OpenCV library, the cascade to be used is loaded from the above-mentioned directory: cascade="HaarCascades/haarcascade_frontalface_default.xml"

then the associated classifier is defined as "detector": detector = cv2.CascadeClassifier(cascade)

The video stream of the active camera is evaluated using: cap = cv2.VideoCapture(0) cap.set(3,320) # set Width cap.set(4,240) # set Height

The resolution of the video image is also set in these lines. The parameters "3" and "4" of the cap.set function are used to define the width and height of the video stream in pixel units. It should be noted that high resolutions slow down the image processing considerably. However, with the 0.7 megapixel resolution (320 × 240) used here, almost real-time

ret, img = cap.read()

the continuous video stream of the camera is broken down into individual images (frames).

● 207

Machine Learning with Python 220224 UK.indd 207

23/03/2022 11:52

Machine Learning with Python

The read() function reads directly from the specified video source, in this example the webcam. The return values are: - the actual video frame read - i.e., one frame per loop pass; - a return code. The return code shows, for example, whether frames are actually read. If a video file is used, the end of the video can be recognized in this way. When reading from the webcam, this does not apply, since the video stream runs practically endlessly. In the main loop, the video images are first converted into a black and white version, as the hair cascades work more efficiently in this way. Then the detector is applied to the B/W video stream just generated: faces = detector. detectMultiScale(gray,scaleFactor=1.1,minNeighbors=5,minSize=(30,30), flags=cv2.CASCADE_SCALE_IMAGE)

The detector uses three other parameters in addition to the video stream itself: 1. scaleFactor: for faster detection, the value can be increased up to 1.4, but then there is an increasing risk that faces will be overlooked. 2. minNeighbors: values of 3–6 are optimal for this application; higher values lead to fewer detections, but with better detection accuracy. 3. minSize — minimum detected face size. Smaller objects are ignored. Normally a value of 30x30 pixels leads to good results. Once a face is detected, four parameters are returned: x: y: w: h:

x-coordinate, upper left image border y-coordinate, upper left image border width of the detected face in the video stream in pixels height of the detected face in the video stream in pixels

Using the function cv2.rectangle(img,(x,y),(x+w,y+h),blue,framewidth)

a frame with the corresponding coordinates in the desired color (here blue) and the thickness "framewidth" is drawn around each recognized face (Figure 15.9).

● 208

Machine Learning with Python 220224 UK.indd 208

23/03/2022 11:52

Chapter 15 • Facial Recognition and Identification

Figure 15.9: Definition of the detector frame. Using the for loop: for (x,y,w,h) in faces all faces recognized by "faces" are collected. Therefore, even the detection of several faces in one image is no problem for the detector.

Figure 15.10: Multiple faces are also captured without any problems. The length of the return vector of "faces" allows you to determine how many faces were detected in the image. This can be done using: print("numer of faces detected:", len(faces))

With this, the applications already presented in Section 15.3, such as determining the number of guests at a party or the number of customers in a salesroom, etc., can be implemented.

● 209

Machine Learning with Python 220224 UK.indd 209

23/03/2022 11:52

Machine Learning with Python

As can be easily determined through your own experiments, photos of faces are also recognized without any problems with this setup. This also highlights a major problem of the method. In security applications, it would be very easy to deceive the system by using photos. This is where the superiority of 3-D measurement methods becomes apparent. With the help of two or more cameras, it is easily possible to distinguish a simple two-dimensional photo from a real three-dimensional person.

Figure 15.11: Deception using photos. Another way to prevent users from using someone else's photo is to use so-called live tests. Here, not only a static image is used but a live video stream as well in which the persons in question have to speak or move their heads. Now that faces have been successfully detected, you can also try to gain further information and details. For example, the eyes can be localized in a face. This can be necessary for iris recognition, for example. The following program (Eye_detector_1V0.py) puts "glasses on a face": import cv2 faceCascade="HaarCascades/haarcascade_frontalface_default.xml" eyesCascade="HaarCascades/haarcascade_eye.xml" cap = cv2.VideoCapture(0) cap.set(3,640) # set Width cap.set(4,480) # set Height detectorFace=cv2.CascadeClassifier(faceCascade) detectorEyes=cv2.CascadeClassifier(eyesCascade) while True: ret, img = cap.read()

● 210

Machine Learning with Python 220224 UK.indd 210

23/03/2022 11:52

Chapter 15 • Facial Recognition and Identification

# img = cv2.flip(img, -1) gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #faces = faceCascade. detectMultiScale(gray,scaleFactor=1.3,minNeighbors=5,minSize=(30, 30)) faces = detectorFace. detectMultiScale(gray,scaleFactor=1.1,minNeighbors=5,minSize=(30,30),flags=cv2. CASCADE_SCALE_IMAGE) for (x,y,w,h) in faces: print("face detected") print(x,y,w,h) cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2) roi_gray = gray[y:y+h, x:x+w] roi_color = img[y:y+h, x:x+w] eyes = detectorEyes.detectMultiScale(roi_gray,scaleFactor= 1.5,minNeighbors=10,minSize=(5, 5),) for (ex, ey, ew, eh) in eyes: print("eyes detected") print(ex,ey,ew,eh) cv2.rectangle(roi_color, (ex, ey), (ex + ew, ey + eh), (0, 255, 0), 2) cv2.imshow('"q" to quit', img) key = cv2.waitKey(1) & 0xFF if key == ord("q"): print("bye...") break cap.release() cv2.destroyAllWindows()

The result is shown in Figure 15.12.

● 211

Machine Learning with Python 220224 UK.indd 211

23/03/2022 11:52

Machine Learning with Python

Figure 15.12: Eye detection. Capturing a smile goes even one step further. This function is already implemented in many digital cameras. It is thus possible to activate the camera's self-timer and ensure that a picture is only taken when all the people in front of the lens are showing smiles and friendly faces. What is interesting here is that not only the mouth is localized. Rather, its specific shape must also be detected, since not every mouth automatically means a smile. This already shows that ML methods can definitely be capable of determining the emotional state of a person. Some experiments in this area can be carried out using the program Smile_detect_1V0.py. Although this is still a very rudimentary implementation, it shows that the program is quite capable of distinguishing between serious and friendly faces.

● 212

Machine Learning with Python 220224 UK.indd 212

23/03/2022 11:52

Chapter 15 • Facial Recognition and Identification

Figure 15.13: Smile Please! Now that face recognition and counting as well as the recording of certain face characteristics are easily possible, the next step can be the identification of persons. The following sections show, however, that this procedure is much more complex than simple face recognition.

15.9 Photo Training In order to extend face recognition to face identification, two more steps are required: 1. taking test images and creating a test data set; 2. training the network with this new data set. First of all, it is necessary to transfer the Python programs: - CollectTestPICs_1V0.py - TrainFaces_1V0.py - FaceRecognizer_1V0.py from the download package to the Raspberry Pi. This can be done either by downloading it to a PC then transferring it via USB stick or by downloading it directly to the Pi. To take the test images, a folder must first be created with the name of the respective test subject, like

● 213

Machine Learning with Python 220224 UK.indd 213

23/03/2022 11:53

Machine Learning with Python

/home/pi/face_detection_recognition/dataset/Sandy In Thonny, the program CollectTestPICs_1V0.py is opened. The name (e.g., Sandy) is also entered there within the inverted commas:

Figure 15.14: Names in "TakeTestPics.py". Of course, the name in the data set folder and in the file must match exactly. After starting the program in Thonny, a window showing the video stream from the camera opens up. This can take several seconds even on a Raspberry Pi 4. Now the camera used must be aligned with the test person's face. By pressing the space bar, a series of photos can be taken. About ten photos are sufficient for a first test. Ideally, the head should be slightly turned, raised or lowered to allow for different perspective angles. People who wear glasses should take a few pictures with and without glasses so that the face can later be recognized in both cases. The program can be stopped by pressing the "q" key (for quit). Afterwards, you can check the photos in the file manager:

Figure 15.15: Test images in the file manager, and a sample image.

● 214

Machine Learning with Python 220224 UK.indd 214

23/03/2022 11:53

Chapter 15 • Facial Recognition and Identification

If desired, the procedure can be repeated for other persons. After the training data set has been compiled, the next step is to train the model. For this purpose, it is only necessary to execute the program TrainFaces_1V0.py It takes about 3 to 4 seconds for the Pi to analyze a single photo of the data set. For a data set with 20 pictures (e.g., two people with ten pictures each), about one to two minutes are required. The training data obtained in this way are finally saved in the file encodings. pickle. The HOG method (Histogram of Oriented Gradients) is used here as the recognition method.

15.10 "Know thyself!" … and others "Know (recognize) thyself" was once written above the Oracle of Delphi — a reminder to mankind to not fall into arrogance. Perhaps this is also a maxim that would suit the developers of ML systems well ... From a purely technical point of view, however, self-knowledge is not a big problem. Now that the preliminary work has been completed, the actual face recognition with FaceRecognizer_1V0.py can be started. After a few seconds, the webcam view opens again. If there is a face in the webcam's field of vision, it is automatically analyzed. If the face was recognized, it is marked with a rectangle. If the model was trained correctly, the name of the person should also appear. If other faces have been trained, they should be recognized as well. If an unknown face is detected, it is marked with "unknown" correspondingly. The program can be terminated again by hitting "q".

Figure 15.16: Person identification using RPi and a PiCam.

● 215

Machine Learning with Python 220224 UK.indd 215

23/03/2022 11:53

Machine Learning with Python

15.11 A Biometric scanner as a door opener Now that individual faces can be successfully identified, the procedure may be used for various applications. The I/O pins of the Raspberry allow electronic or electro-mechanical devices to be controlled directly or via relays or transistors. In this way, a face identification system can be extended to a biometric scanner. Together with a conventional electromagnetic door opener, for example, a lock can now be opened automatically as soon as a known person has been detected by the PiCam. To do this, the FaceRecognizer_1V0.py program only needs to be extended by a few lines. Besides initializing the GPIO pins: import RPi.GPIO as GPIO GPIO.setmode(GPIO.BCM) GPIO.setup(23, GPIO.OUT)

only the activation of the port is required. As soon as a known person is recognized by the system: if "" in names: print(names) GPIO.output(23, GPIO.HIGH) else: GPIO.output(23, GPIO.LOW)

the corresponding pin is switched to High potential. The complete program FaceKey_1V0. py is included in the download package with this book. In the following hardware setup, the LED lights up when a person known to the system appears in the PiCam's field of view:

● 216

Machine Learning with Python 220224 UK.indd 216

23/03/2022 11:53

Chapter 15 • Facial Recognition and Identification

Figure 15.17: Switching an LED when a face is recognized. If, for example, a door opener is to be operated with this setup, the LED only has to be replaced by a relay that is able to control an electro-mechanical opener. However, in practical applications, you should always bear in mind that this simple facial identification by no means meets any security standards. For securing the door of your house or flat you should in any case resort to professional systems.

15.12 Recognizing gender and age Estimating a person's age often leads to problems. For females, an overestimated age is unlikely to go down very well. Many people find it difficult to quantify the exact age of other persons. Yet this can be of considerable importance. Just think of applications in marketing. Here, it is quite important to have information about the age distribution of your own customer base. Recognizing the gender of a person, on the other hand, is relatively unproblematic for most people (in most cases...). But here, too, automatic recognition could result in many interesting applications. After faces have been successfully recognized and even identified in the last sections, the Python programs we already discussed will be extended in such a way that they also enable estimates regarding the gender and age of a person. Ready-trained "Caffe" models can be used for this purpose. You will find the link to this in the download package. A compressed file age_gender_model.tar.gz containing the four files:

● 217

Machine Learning with Python 220224 UK.indd 217

23/03/2022 11:53

Machine Learning with Python

age_net.caffemodel deploy_age.prototxt deploy_gender.prototxt gender_net.caffemodel is available in the LINKS.txt file. These files must be copied into a sub-folder called age_ gender_models of the project folder. With Caffe (Convolutional Architecture for Fast Feature Embedding), a deep-learning framework is available that is particularly characterized by the high classification speed of its models. This is reflected in this application by the fast gender recognition and age estimation and the associated high frame rate. A detailed description of Caffe is beyond the scope of this book. Interested readers are referred to the relevant literature (see LINKS.txt). The associated Python program AgeGenderDetector_1V0 looks like this: from picamera.array import PiRGBArray from picamera import PiCamera import imutils import time import cv2 cascade="haarcascade_frontalface_default.xml" detector = cv2.CascadeClassifier(cascade) camera = PiCamera() camera.resolution = (320,240) rawCapture = PiRGBArray(camera, size=(320,240)) MODEL_MEAN_VALUES = (78.4263377603, 87.7689143744, 114.895847746) age_list=['(0-3)','(4-6)','(7-12)','(13-20)','(21-32)','(33-43)','(44-53)','(54100)'] gender_list = ['Male', 'Female'] green=(0,255,0) time.sleep(0.1) def initialize_caffe_model(): print('Loading models...') age_net = cv2.dnn.readNetFromCaffe("age_gender_models/deploy_age. prototxt","age_gender_models/age_net.caffemodel") gender_net = cv2.dnn.readNetFromCaffe("age_gender_models/deploy_gender. prototxt","age_gender_models/gender_net.caffemodel") return (age_net, gender_net) def capture_loop(age_net, gender_net): for frame in camera.capture_continuous(rawCapture, format="bgr", use_video_ port=True): image = frame.array face_cascade = cv2.CascadeClassifier('haarcascade_frontalface.xml') gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)

● 218

Machine Learning with Python 220224 UK.indd 218

23/03/2022 11:53

Chapter 15 • Facial Recognition and Identification

faces = detector. detectMultiScale(gray,scaleFactor=1.1,minNeighbors=5,minSize=(30,30)) print("No of face found ",str(len(faces))) for (x,y,w,h) in faces: cv2.rectangle(image,(x,y),(x+w,y+h),(255,255,0),2) face_img = image[y:y+h, x:x+w].copy() blob = cv2.dnn.blobFromImage(face_img, 1, (227, 227), MODEL_MEAN_ VALUES, swapRB=False) # Predict gender gender_net.setInput(blob) gender_preds = gender_net.forward() gender = gender_list[gender_preds[0].argmax()] # Predict age age_net.setInput(blob) age_preds = age_net.forward() age = age_list[age_preds[0].argmax()] overlay_text = "%s, %s" % (gender, age) # cv2.putText(image, overlay_text ,(x,y), cv2.FONT_HERSHEY_SIMPLEX, 1,green,1,cv2.LINE_AA) cv2.putText(image, overlay_text ,(x,y), cv2.FONT_HERSHEY_SIMPLEX, 0.5,green,1) cv2.imshow("Press 'q' quit", image) key=cv2.waitKey(1)&0xFF rawCapture.truncate(0) if key==ord("q"): print("bye...") break age_net,gender_net=initialize_caffe_model() capture_loop(age_net, gender_net) cv2.destroyAllWindows()

Figure 15.18 shows a typical outcome of the system.

Figure 15.18: Good estimation. Actual age: 38 years.

● 219

Machine Learning with Python 220224 UK.indd 219

23/03/2022 11:53

Machine Learning with Python

Chapter 16 • Train Your Own Models So far, mainly readily available models have been used for the tasks at hand. This is certainly also the typical approach if you want to implement machine learning projects on the Raspberry Pi or the MaixDuino. The Internet offers an almost inexhaustible variety of pretrained neural networks and models. This covers a large number of applications. In the last chapters, models for -

digit recognition; object detection; face recognition; face identification,

were used. The models often even have a certain learning ability. For example, it was not only possible to recognize predefined persons with the face identification code. Rather, after appropriate training, it was also possible to use new, self-defined faces as "keys". Things are somewhat different if you want to train neural networks completely on your own. This usually involves considerable effort. Creating the recognition program in Python is usually the least of the problems. The biggest task is collecting extensive "learning material" for the neural network. In the case of the ready-made models available on the Internet, people often revert to abundant data material from general websites or even from social networks. As already mentioned, users have often granted the data collectors extensive rights of use without really being aware of it. If, on the other hand, you want to train your own special model, you have to collect the data yourself. In this chapter two examples will be used to demonstrate how this can be done.

16.1 Creation of a model for the MaixDuino The following steps are required to create your own model for the MaixDuino: -

create a Jupyter notebook as a training base; exporte a tflite model; convert the tflite model into a kmodel readable by the MaixDuino; loading the kmodel onto the MaixDuino; create a Python program to evaluate the kmodel

These steps will be explained using the MNIST data set as an example. Although this consists of prefabricated image data, these could easily be replaced by your own images, similar to the fashion data in section 13.4. Training with self-created images will be discussed in the next chapter. In order to create a custom model for the MaixDuino, first the notebook

● 220

Machine Learning with Python 220224 UK.indd 220

23/03/2022 11:53

Chapter 16 • Train Your Own Models

MNIST_to_tflite.ipynb must be loaded into Jupyter on a PC. This notebook essentially contains the cells already known from Chapter 12. However, a cell has been added at the end which allows the model to be saved in tflite format: #convert model to tflite import tensorflow as tf converter = tf.lite.TFLiteConverter.from_keras_model(model) converter.experimental_new_converter = True tflite_model = converter.convert() open('MNIST_01.tflite', 'wb').write(tflite_model)

After the model has been processed, a file named MNIST_01.tflite is present in the current directory. This must now be converted into the kmodel format that can be read by the MaixDuino. A program called ncc.exe is required for this task. The download link can be found in the file LINKS.txt. The just created MNIST_01.tflite file and the ncc.exe program should be copied into a new directory. In addition, some training images must be loaded into a sub-directory named images. These images are needed as calibration data for the conversion. In a Windows console (cmd.exe), first change to the directory created above and then execute the following conversion command ncc compile MNIST_01.tflite MNIST_01.kmodel -i tflite --dataset images The conversion is completed after a few seconds (Figure 16.1).

Figure 16.1: Conversion from tflite to kmodel.

● 221

Machine Learning with Python 220224 UK.indd 221

23/03/2022 11:53

Machine Learning with Python

The kmodel file created in this way can now be copied to the MaixDuino in the usual way. Then only the appropriate Python program is needed(MNIST_2_KPU.py). There, the standard model must be replaced by the new MNIST_01.kmodel: ... # task = kpu.load("/sd/MNIST.kmodel") task = kpu.load("/sd/MNIST_01.kmodel") kpu.set_outputs(task, 0, 1, 10, 1) …

Additionally, the output format of the model must be specified: kpu.set_outputs(task, 0, 1, 10, 1) because the new kmodel (version 4) also allows multi-output models. The statement kpu.set_outputs(task, output_idx, w,h,ch) has 5 arguments: - task: the current model; - output_idx: the index of the output, as kmodel-V4 supports multi-outputs; - w, h, ch: width, height and output channel of the output vector. Since the MNIST model is coded as one-hot output, this results in a 1 × 10 vector, with a single output channel: kpu.set_outputs(task, 0, 1, 10, 1)

Since no multi-output is used, output_idx = 0 is to be set. This means that the same results can now be read out as in the previous version V3. Note that a MaixPy version ≥ 6.0 is required to use V4 models. Now the new model can be tested. As soon as a digit is visible in the camera's field of view, it should be recognized and evaluated. Of course, the recognition quality of the system depends on the quality of the training, the data set in use and the parameter adjustment. A direct comparison with the professionally created model from Section 12.11 shows that the home made network usually performs worse. However, by optimizing the training parameters or increasing the number of epochs, etc., the new model can still be improved. Ultimately, it should be possible to achieve the quality of the original network in this way.

16.2 Electronic parts recognition with the MaixDuino In a final application, we go even one step further. Not only training and conversion are carried out by the user, but also new, self-created images will be used. This is, to a certain extent, the supreme discipline when working with ML applications. You gain maximum freedom, and self-created networks can be used for practically all desired image data. This includes an ample number of applications:

● 222

Machine Learning with Python 220224 UK.indd 222

23/03/2022 11:53

Chapter 16 • Train Your Own Models

-

recognition of animals and animal species; sorting of goods or objects such as tools or toys; waste separation, deposit bottle recognition, etc.; recognition of vehicles and vehicle types; recognition and sorting of electronic components; categorization and sorting of fruits and vegetables. ...

However, you must be aware that this project is a demanding task. Moreover, you should approach this task with a certain willingness for experimentation. The current process is not only quite complex but also subject to rapid changes due to the fast developments in this field. Consequently, it cannot always be guaranteed that the versions of the various programs and tools given here will work without problems in all combinations. In particular, the conversion of Tflite models into readable Maix models does not appear to be fully developed yet. Ultimately, the only hope here is that the corresponding processes will be simplified and standardized in the near future. In particular, direct export options from Python to different model variants would be desirable. The following graphic shows, using the example of a classification for electronic components, an overview of the individual steps required for the construction and training of a network with its own (image) data:

Figure 16.2: Training with your own data.

● 223

Machine Learning with Python 220224 UK.indd 223

23/03/2022 11:53

Machine Learning with Python

If you want to run a training round with your own pictures, you have to take them yourself, of course. This is certainly one of the most time-consuming tasks of such a project. Typically, several hundred pictures are required for an effective training. However, for a first success test under simplified conditions, less than one hundred images may also be sufficient. The main focus of the application is on the recognition and classification of electronic components. The goal of the project to is to enable the MaixDuino to categorize resistors and transistors as individual electronic components — a task that already overburdens some the average purchasing agent in big electronics companies! First of all, a few dozen images of resistors or transistors are required. These can either be taken using a camera or collected from the Internet. In the second case, the relevant legal situation must be observed, specifically if the data and results are to be published. To make the task a little easier, the components should be photographed using a simple and uniform background. In this way, you can achieve reasonable classification results even with a relatively small number of images.

Figure 16.3: Resistors and transistors. The photos must be copied into a separate directories (transistors/resistors). The following image shows the entire directory structure of the project:

Figure 16.4: Directory structure for the component classifier.

● 224

Machine Learning with Python 220224 UK.indd 224

23/03/2022 11:53

Chapter 16 • Train Your Own Models

The training program (ElectronicsClassifier_1V0.ipynb) is based on this structure and is designed in such a way that it can easily be adapted to other data sets. The division of the data into a training and a test set is done via the parameter val_ratio (default: 0.2) This ratio is usually appropriate if the number of images is sufficient. However, there is no fixed rule. With other ratios (e.g., 90:10), greater fluctuations often occur during validation. The network itself is based on the MobileNet developed by Google. Since only 2 megabytes of RAM of the KPU are available in the MaixDuino, the number of parameters of the network was reduced accordingly. If there are still problems with the memory size, for example, with other converter versions, a further reduction by the user is required. The complete model is trained in two steps. In order to train the classifier separately, some convolution layers are "frozen" after the model definition, i.e., they are not included in this training run. Subsequently, the training is continued for all parameters with a lower learning rate. After the training is completed, the model is saved in both h5 and Tflite format. The learning curves show a continuous increase. However, there are also significant fluctuations. This is due to the relatively small validation data set. With increasing data size, the curves should approach a smooth curve similar to Figure 12.9.

Figure 16.5: Learning curves for the electronics classifier.

16.3 Performance of the trained network Accuracy is typically around 70–80% for a training data set of less than 100 images. Figure 16.6 shows the corresponding confusion matrix.

● 225

Machine Learning with Python 220224 UK.indd 225

23/03/2022 11:53

Machine Learning with Python

Figure 16.6: Confusion-matrix for resistor and transistor recognition. The matrix shows that although 34 transistors were classified as resistors, 81 resistors could also be recognized correctly. These results can certainly still be improved. Larger training data sets in particular should significantly improve the recognition accuracy. Nevertheless, the precision achieved here is sufficient to be able to test the system in practical applications.

16.4 Field test Due to the low resources of the MaixDuino having only 2 MB RAM, it is impossible to run a complete TensorFlow system the K210. To execute the model, the TensorFlow operations must first be translated to a special instruction set for the KPU of the Maix. In addition, the conversion also optimizes the model in order to reduce the required computing power and the memory requirements. The ncc tool is used again for the conversion. After the ElectronicsClassifier_1V0.ipynb notebook has been completely processed, the file ec.tflite can be found in the current directory. This file must be converted into the kmodel format. After copying the training images into a sub-folder named images_ElectronicsClassifier, the corresponding conversion command is executed in a Windows console: ncc compile ec.tflite MNIST_ec.kmodel -i tflite --dataset images_ElectronicsClassifier

After copying the kmodel file to the MaixDuino, the appropriate Python program (ElectronicsClassifier .py) can be started. Due to the very limited memory resources, the compiler version maixpy_v0.6.2_41_g02d12688e_minimum_with_kmodel_v4_support.bin

● 226

Machine Learning with Python 220224 UK.indd 226

23/03/2022 11:53

Chapter 16 • Train Your Own Models

should be installed via the kflash tool. This version is optimized for minimum memory requirements. This leaves the largest possible share of the scarce memory space for the network model. In the example presented here, the memory capacity of the Maix board was completely exhausted. With other versions of the board (e.g., Sipeed MAIX Bit Suit) or with newer tools, errors may therefore occur due to a lack of memory. In this case, the original model in ElectronicsClassifier_1V0.ipynb must be cut down. After successfully starting the program, a live image of the camera is displayed. The name of the detected components are displayed in the upper left corner (Figure 16.7).

Figure 16.7: MaixDuino recognized a transistor. As already stated, the quality of the classifier depends mainly on the number of training pictures. If less than 100 images are used per category, the results are usually not very dependable. Experiments show that using up to about 500 pictures per class improves the recognition accuracy significantly. Of course, other factors also have a significant influence. For example, the quality of the pictures themselves is very important. Lighting conditions or the respective background also play a significant role. The training or the structure of the network itself can also be improved. This offers a wide field of experimentation. The following parameters are of particular interest: - number of training epochs; - structure of the neural networks used; - division of the data into training and validation sets. However, it must always be kept in mind that the MaixDuino only offers a relatively small amount of memory. When expanding the structure of the neural network, the limits are quickly reached.

● 227

Machine Learning with Python 220224 UK.indd 227

23/03/2022 11:53

Machine Learning with Python

However, once the system is running more or less problem-free, other training images (for animal recognition or vehicle classification, etc.) can be used. It is also possible to use more than two categories. However, this also multiplies the number of training images required.

16.5 Outlook: Multi-object detectors The image recognition and classification models used so far in this chapter utilized the whole image as input. The output was a list of probabilities for each class to be recognized. The best results were achieved when the respective object took up a large part of the image. The exact location of the object in the field of view was not important. However, many applications, such as object or face tracking, require not only the recognition of objects in the image, but also its exact location. Also, detection of multiple objects in an image requires the determination of the respective locations of the objects in the camera image. For these tasks, more complex object recognition models are required. So-called YOLO (You Only Look Once) architectures can be used here. The method first adds a coordinate grid to the image data to be examined. Then the presence or absence of the objects in question is determined in each grid with the already known methods. In addition, so-called "anchors" are used. The "anchor points" are adapted to the expected object size and facilitate a pre-selection of possible image features. Anchors have also already been used in the pre-trained models (see Section 13.6, among others). If individual features are detected, a set of predictions is calculated for each grid cell. Predictions with low probability are discarded. In this way, the remaining cells contain the searched objects with a high probability. The basic idea behind You-Only-Look-Once is thus to divide the overall image into small regions or grid areas and apply a neural network search to each grid. Three parameters play an important role here: 1. S: number of grid cells into which the image is divided. 2. B: number of bounding boxes calculated for each grid cell 3. C: number of different classes that can be predicted for each grid cell. Figure 16.8 illustrates the YOLO principle.

● 228

Machine Learning with Python 220224 UK.indd 228

23/03/2022 11:53

Chapter 16 • Train Your Own Models

Figure 16.8: YOLO principle. In this case, the image is divided into 4 × 8 grid cells (S = 32). Each cell is assigned a certain probability that it contains an object of interest. The cells with high probabilities are marked with frames. From here on it becomes clear where objects could be located in the image. However, it is not yet determined what the object really is. Therefore, each cell is subsequently assigned a class probability. The individual results are subsequently combined to object frames. Then, the probability is determined that the object frame contains a certain object, such as a car, a dog, or a person. If this probability exceeds a certain threshold, the object is assigned to the corresponding category. The projects presented in Chapter 13 already made partial use of this YOLO technology. However, the training of YOLO systems is once again significantly more complex than the procedure presented in the last section. In particular, not only large numbers of images or photos are required. Rather, the searched-for object must be marked and categorized manually on each of the pictures. This requires considerable effort, even with tools specifically developed for the purpose. Using the respective category and the determined coordinates, the YOLO system can finally be trained. Classifying and determining coordinates on hundreds of images can take days or even weeks. Here it becomes clear, that this effort is only justified in very special cases, for example, if certain animal species or a specific types of vehicles should be automatically detected. In general, therefore, you should first try to find a pre-trained model for a specific project. Many different versions can be found on the Internet. Only after if this approach is not successful, it might make sense to train your own YOLO systems.

● 229

Machine Learning with Python 220224 UK.indd 229

23/03/2022 11:53

Machine Learning with Python

Chapter 17 • Dreams of the Future: from KPU to Neuromorphic Chips The term Artificial Intelligence is often criticized for the fact that no one really knows what it means exactly. Most diverse approaches can be found in the definitions of 'intelligence'. These range from the ability to think logically to cognitive tasks to composing symphonies. In addition, there are characteristics such as (self-)awareness, the ability to plan the future or to solve problems. For the marketing departments of many companies, however, AI is a gold mine. Products are quickly described as AI-capable as soon as, for example, a voice assistant can react "intelligently" to customer requests. The extent to which the term intelligence can actually be applied to software architectures or technical and electronic systems, especially chips and chip structures, will be looked at a little more closely in this chapter. Even as a component of AI systems, most computers still work with a Von Neumann architecture and sequential data processing. PCs and the Raspberry Pi are typical examples. The basis of these machines are two components, i.e., the CPU, which processes the data, and the main memory (RAM), which stores the data. First, the CPU loads its instructions from the memory and then fetches the data to be processed. Once an instruction has been processed, the result is written back to memory and the next cycle begins. A shared memory contains both the program instructions and the data. This computer architecture is designed to process numbers and run deterministic programs. However, other tasks such as image recognition cannot be solved efficiently with this structure. Even with simple tasks, such as distinguishing between dogs and cats, which every small child can do easily, are challenging for conventional computer system. In human brains, unlike in a computer, the storage and processing of information takes place in a single inseparable unit. Both functions are closely related and tightly connected. Thus, there is no central unit for data processing. Rather, this takes place in approx. 86 billion individual neurons. Based on these findings, extremely powerful graphics processors were used from the turn of the millennium onward, which relied on massive parallel processing of data. These subsequently proved to be excellently suited for using neural networks for pattern and speech recognition. These network structures learned the rules, which were later on applied in the active phase, from an abundance of examples [12]. Based on these advances in software technology, new types of structures were also used in hardware development. The goal was to develop integrated circuits, processors and chips that corresponded as closely as possible to human brain structures. Massively parallel graphics cards were increasingly replaced by the innovative and adaptive chips. In this way, the neuronal structures widely used in biology were transferred almost 1:1 to ultra-modern processor architectures. After a corresponding learning phase, these chips cope with previously unsolvable tasks. Thus, the ever-growing family of "neuromorphic chips" was born.

● 230

Machine Learning with Python 220224 UK.indd 230

23/03/2022 11:53

Chapter 17 • Dreams of the Future: from KPU to Neuromorphic Chips

The Kendryte K210 used in the MaixDuino is put on the market as an "AI-on-the-edge chip". Kendryte (from Chinese "Kan-zhi") roughly means exploration of intelligence. Nevertheless, the chip essentially offers just a dual-core 64-bit RISC-V processor with all the peripherals common in controller technology. The main scope of application of the K210 is on the Internet of Things (IoT, see also Section 7.1). The chip offers AI solutions to add some intelligence to the IoT and thus upgrade it to AIoT (Artificial Intelligence of Things). According to the data sheet, the KPU processor is a neural network processor that supports special operations such as mathematical convolution with a computing power of up to 0.8 TeraFLOPS. One of the central applications is thus the fast and efficient recognition of objects, persons or faces in real time. This is based on the application of CNNs, i.e., Convolutional Neural Network structures (see also Chapter 12). The convolutional network operations serve as filters for the recognition of visual shapes such as edges, lines or circles etc. The features detected with the CNN are then used as inputs for deeper layers of a general neural network. This makes machine vision and specifically object and face recognition the chip's particular strengths. The MaixDuino board can thus be compared to an artificial retina. In addition to the actual detection of optical signals, the retina also provides an initial pre-filtering of sensory impressions in the eye. Just as with the retina, a CNN used for machine vision is used in the MaixDuino to reduce the huge amount of raw image information. In the K210, the "KPU" peripheral is used to offload CPU-intensive operations to dedicated hardware (see Section 7.1). This leads to a significant acceleration of visual signal processing, similar to the application of a video processor. Since image processing involves massively parallel processes, special graphics processors in the form of GPUs can accelerate the processes immensely. However, the K210 remains essentially a vector processor. It is optimized for special mathematical operations such as convolutions or weighted sums and achieves an outstanding processing speed in these areas. However, it is still far from being a true neuromorphic chip. Even though decades of research have opened up new ways to process information faster and categorize objects more efficiently than ever before, chips like the K210 certainly cannot yet be considered "brains in miniature". Although current chips have become increasingly powerful in neural network applications and more affordable in terms of cost, they do not represent a principal advance in processor technology. This development was mainly inspired by GPU-based graphics cards. As a result, vector processing capability has developed extremely rapidly and in some ways even overtook classical CPU development. For scientific, medical and many other fields, this resulted in enormous advantages. Nevertheless, the K210 is still a more or less classic processor that is ideally suited for "AI" applications in the IoT field only through its connection to a controller peripheral.

● 231

Machine Learning with Python 220224 UK.indd 231

23/03/2022 11:53

Machine Learning with Python

However, based on the success of the GPUs, several large chip manufacturers began to develop and produce new types of chips without the classic Von Neumann architecture. Instead of the conventional structure, neuromorphic structures were now used. Figure 17.1 shows the basic structure of these chips:

Figure 17.1: Structure of a neuromorphic chip. Instead of a CPU and a central memory, the chips contain a large number of combined and independent CPU/memory units that can communicate via a tight network. The structure thus approximates the human brain, which also consists of a dense neuronal network. The chips often contain thousands of computer cores and thus a multitude of individually programmable "neurons". The individual cores are connected via a complex network of configurable connections. The "electronic neurons" transmit, receive and collect signals in the form of so-called spikes, whereby a new spike is generated whenever the summed input signals reach the activation threshold set in the training phase [12]. The chips thus behave very similar on the hardware level as the neural networks presented in Chapter 11 worked in the software range. By transferring the back-propagation algorithm to the hardware level, the first self-learning chips were created. With the help of extensive data sets from the Internet, the internal neuromorphic connections could now be trained and optimized, so that increasingly better results were achieved in many applications. The hardware systems could thus cope with new tasks without special programming and, in a certain sense, "learn" new relationships independently. Especially in robotics and in the field of autonomous driving, neuromorphic computing has an outstanding potential for innovation. Audiovisual information, i.e., images and speech, but also sensor data can be captured and processed extremely effectively. Since neuromorphic structures can be implemented directly on conventional silicon chips, no costly new technological developments are necessary. This is another reason why development in this area is progressing at high speed [2, 12].

● 232

Machine Learning with Python 220224 UK.indd 232

23/03/2022 11:53

Chapter 17 • Dreams of the Future: from KPU to Neuromorphic Chips

Modern lawn mowers are capable of independently covering the lawn area assigned to them. Obstacles such as bushes or trees are automatically detected and subsequently avoided. In this way, the machines can practically create a "picture" of their surroundings on their own and set up their actions accordingly. The same applies to cleaning robots. These are also able to learn the structure of their domestic environment. Problems such as getting stuck on furniture or other furnishings can thus be avoided. Robot systems based on neuromorphic chips will increasingly interact with their environment in a more human-like way. In medical technology, the technology has already taken over various routine tasks. For example, automated image processing systems relieve doctors of the burden of analyzing X-rays or computer tomographic images. In the detection of certain types of cancer, the systems already outperform experienced specialists in some areas. In future, tablets and smartphones will adapt automatically to the habits of their users. The devices will thus become genuine personal assistants. The setup and personalization that is often still tedious today can then be completely eliminated. The recognition of faces or fingerprints to unlock devices is already a typical task for neuromorphic chips. Neuromorphic structures will increasingly blur the boundaries between technical and biological systems. So far, these applications are only to be found in a few niches. However, the successes in medicine, in smart home applications or in facial and fingerprint recognition leave no doubt about the direction of future developments. Classical processors and microcontrollers will undoubtedly reach their technical limits at some point. Moore's Law, i.e., the continuous doubling of the performance of electronic systems within approx. 1½ years, cannot apply forever. Moreover, ever smaller chip structures and higher clock frequencies do not automatically lead to "smarter" systems. Other physical limits, such as power dissipation or the laws of quantum mechanics, are also increasingly limiting the performance of modern chips. This limits the possibilities of classical systems with regard to sensory data processing, image and speech recognition. Conventional technologies require supercomputers or entire server farms for these projects. As a result, voice-controlled assistants such as Alexa or Siri are dependent on an online connection with powerful computers in a cloud set-up. Neuromorphic chips, on the other hand, do not process audiovisual data sequentially, but highly parallel. Similar to a biological brain, "sensory impressions" can thus be evaluated quickly and efficiently. Neurons in the brain are also capable of reacting flexibly to changing sounds, images or other sensory impressions because the connection structures in the brain are flexible. Comparable to classical "learning", this facilitates adaptation to new situations. Systems with brain-like structures are much more suitable for corresponding tasks than conventional computers. Neuromorphic units can therefore adapt very efficiently to changes and variations, for example in speech and image recognition. To further increase efficiency, the latest developments are also trying to use the third spatial dimension. Similar to the human brain, this would create a compact system with highly efficient three-dimensional networking. Such a geometry can be scaled to almost any size. Due to the dual function of artificial neurons as memory and CPU, neuromorphic chips

● 233

Machine Learning with Python 220224 UK.indd 233

23/03/2022 11:53

Machine Learning with Python

achieve high computing power densities and work on low electrical power requirements. Neuromorphic chips consume only a few watts of electrical power and get by with a fraction of the energy required by a conventional computer. So far, the performance of neuromorphic systems is still far from that of a human brain. At best, neuromorphic structures are currently comparable to a mouse brain in some special areas. However, when it comes to processing motoric and sensory data as well as "learning", current chips are already significantly faster than Von Neumann computers. Obviously, it is not practical to emulate the brain with special software on conventional processors. The software-based recognition of dogs or cats in videos has clearly demonstrated this. Such procedures are not suitable for developing machines with greater "intelligence". The task of conventional computers is to process numerical data. The systems were never optimized for processes requiring intelligence. Thus, it is becoming increasingly clear, that chip technology must be fundamentally changed if machines are to behave more intelligently. The basic ideas on neuromorphic chip technology have been known for decades. Corresponding analogue circuits were already developed in the 1990s. These were already able to simulate the electrical activity of neurons and synapses in the brain. Development activities aimed at integrating complex analogue circuits into compact chips were applied in highly specialized ICs for noise suppression. Using this technology, so-called cochlear implants, i.e., systems that convert sound directly into signals for the human auditory nerve, have already restored a certain degree of hearing to many people. Such technologies are also increasingly being used in smartphones and other audiovisual devices. However, the application of this "analog AI technology" has also remained limited to a few specialized areas. Up to now, digital processors and units such as CPUs, GPUs, gate arrays or highly specialized digital ICs have been predominantly used for training and deploying AI algorithms. However, true neuromorphic hardware structures are expected to speed up AI applications 1000- to 10,000-fold when analog computing elements are used. The signals will then no longer be in digital form as pure zero and one values but will change continuously. In biological brains, signal processing and transmission also takes place via quasi-analog signals. In this way, significantly more information can be transmitted per data channel and the required power is reduced considerably.

● 234

Machine Learning with Python 220224 UK.indd 234

23/03/2022 11:53

Chapter 17 • Dreams of the Future: from KPU to Neuromorphic Chips

Figure 17.2: Future chips with neuronal brain functionality? The developers of the innovative neuromorphic chips have already celebrated great triumphs in many fields. In particular, they are already being used commercially in mobile devices such as smartphones and tablets for voice, face and fingerprint recognition. It will certainly take some time before analog neuromorphic chips are available for universal use in mass markets. However, the market potential is so huge that development is certainly proceeding at breakneck speed. The KPU unit of the K210 therefore represents only a very first step in the direction of specialized AI chips. Nevertheless, the performance in terms of object or face recognition is already quite remarkable - and only hints at what real "intelligent" or neuromorphic chip structures will be capable of in the future. Whether these developments will become a curse or a blessing for humanity is (still!) in our hands...

● 235

Machine Learning with Python 220224 UK.indd 235

23/03/2022 11:53

Machine Learning with Python

Chapter 18 • Electronic Components Machine learning and neural networks are mainly software-based topics. Many introductions deal exclusively with PCs running Windows or Linux. Therefore, it is difficult to include hardware components such as light-emitting diodes (LEDs), transistors or sensors, as modern computers do not provide suitable interfaces for these devices. However, in this book the Raspberry Pi and the MaixDuino are used, hardware systems that provide various interface possibilities via the GPIO pins (General Purpose Input/Output) readily available on the boards. Thus, it is possible to connect electronic components with little effort. In this way, the software-based ML applications can also be used in the real world, and new interesting projects are emerging in the field of IoT or physical computing. The biometric scanner from Section 15.11 or the bottle detector from Chapter 13.7 are just two examples. So, if you want to work with ML projects not just on the screen, you need a few additional components. With these, both the RPi and the MaixDuino can be expanded into complete stand-alone systems. In the simplest case, an external LED with series resistor can be used to set up a person recognition system, for example. However, as the complexity of the projects increases, so does the number of components required. This section therefore describes the most important basic components, such as: -

jumper wires resistors light-emitting diodes (LEDs) transistors sensors

The following sections go into more detail about the most important components and their applications. These chapters cannot possibly provide a complete introduction to electronics. If necessary, further sources can be found in the Bibliography.

18.1 Breadboards Various methods are available for electronic circuit construction. One variant is the creation of a printed-circuit board onto which the components are soldered. This allows reliable and durable circuits and devices to be created. In addition, special "perfboards" or "Veroboards" are widely used for hobby applications. These have a regular conductor track structure, which allows the construction of more complex circuits with components and connecting wires. However, this method is not very suitable for experimental purposes as it is always necessary to re-solder parts in case of modifications or changes.

● 236

Machine Learning with Python 220224 UK.indd 236

23/03/2022 11:53

Chapter 18 • Electronic Components

Recently, so-called "breadboards" became popular for experimental applications. These are solderless boards that allow fairly complex circuits to be put together without soldering. These boards are ideal for teaching and experimenting with the Raspberry Pi or the MaixDuino [13,15]. Electronic components can be connected to the boards without soldering. Most breadboards are divided into two parts: 1) the main connector panel 2) two or more supply voltage ("power") rails.

Figure 18.1: Some breadboards. The main panel consists of several rows of metal springs. The rows each have five holes for receiving component wires. Component wires inserted into one of these five holes are electrically connected to each other. The lines in Figure 18.1 indicate this. Bus rails are connected over half or the full length of the plug-in board. If you want to use split rails across the entire breadboard, you have to connect the rails using wire jumpers. The bus rails are often used as power supply rails. But they can also be used for other purposes. Plug-in boards come in a wide variety of shapes, colors, designs and sizes. Many variants have mechanical connectors at the edges so that several boards can be plugged together. Breadboards are widely used in electronics industry for circuit development. With proper use, they achieve a long lifetime. It should be noted that thin wires are easily bent when plugged in. Therefore, the fragile component wires must always be inserted exactly perpendicularly. With new boards, inserting the wires requires some force. Tweezers can be useful here.

● 237

Machine Learning with Python 220224 UK.indd 237

23/03/2022 11:53

Machine Learning with Python

18.2 Wires and jumpers Since thin individual wires often do not provide good electrical contact, so-called "jumper wires" can be used to make electrical connections on a breadboard. These are flexible stranded wires with contact pins at both ends. This method allows electrical connections to be made quickly and securely. In addition, the contact pins bend less easily than bare wire ends. In this way, even breadboards with stiff contact springs can be wired without problems.

Figure 18.2: Jumper wires. As an alternative to the "jumpers", so-called Dupont cables can be used. These can also be used to connect components such as the DHT-11 sensor to a breadboard.

Figure 18.3: Dupont cable.

18.3 Resistors Virtually no electronic circuit can do without resistors. These most basic components of electronics have no polarity, i.e., it does not matter in which way around they are inserted into a circuit. Resistor values are coded by color rings. Four color rings are used for resis-

● 238

Machine Learning with Python 220224 UK.indd 238

23/03/2022 11:53

Chapter 18 • Electronic Components

tors with higher tolerances. Three rings indicate the resistance value, the fourth shows the tolerance value (gold (5%) or silver (10%)). Metal-film resistors have five colored rings, here the fifth, usually brown, ring indicates the tolerance. The ring for the tolerance value can be recognized by the fact that it is slightly wider than the others.

Figure 18.4: Resistors. Resistors with the values 150 or 220 ohm are often used as series resistors for LEDs. Values of 1 kilo-ohm, 10 kilo-ohm and 100 kilo-ohm are also frequently required. Additional values can be generated via parallel or series connections. (e.g., 500 ohms by parallel connection of 2 × 1 kilo-ohm or 20 kilo-ohm by series connection of 2 × 10 kilo-ohm, etc.). For starters, the following values are recommended for approx. 10 pieces each: Value 220 ohm 1 kilo-ohm 10 kilo-ohms

Color Rings for 5% Tolerance Red - Red - Brown Brown - Black - Red Brown - Black - Orange

Color Rings for 1% Tolerance Red - Red - Black - Black Brown - Black - Black - Brown Brown - Black - Black - Red

Assortment boxes with the most important resistor values are often offered in electronics and online shops. In general, it is much cheaper to purchase such an assortment than to order the resistors individually.

18.4 Light-emitting diodes (LEDs) Light-emitting diodes (LEDs) are often used to indicate port states (high/low). They only light up if their polarity is correct. If, contrary to expectations, an LED does not light up, first check the polarity. The cathode of an LED has a flattening on the plastic housing. This terminal must be connected to the more negative voltage potential (–). The anode must be connected to the more positive potential (+). Often, the cathode can also be recognized by the shorter connecting wire.

● 239

Machine Learning with Python 220224 UK.indd 239

23/03/2022 11:53

Machine Learning with Python

When connecting standard LEDs to a voltage source, a series resistor is required. Since the signal levels of both the MaixDuino and the Raspberry Pi are 3.3 V, 220-ohm or 150-ohm resistors are best suited. Depending on the color, an LED drops between 1.6 V and 2.5 V, leaving approx. 0.8 V to 1.6 V for the series resistor. The LED currents are therefore around 4 to 7 mA. Thus, modern LEDs show sufficient brightness and both the LED and the microcontroller operate in a safe range.

Figure 18.5: Light-emitting diodes (LEDs). Even simpler is the use of so-called 5 V LEDs. With this type, the series resistor is already integrated into the housing. A separate series resistor is therefore not required. Modern 5 V LEDs also are sufficiently bright when operated at 3.3 V.

18.5 Transistors For the control of higher power levels, a transistor is still the method of choice. The following figure shows a classic small-signal transistor. Its connections are marked with - collector (C) - base (B) - emitter (E) These three connections can also be found in the circuit diagram.

Figure 18.6: Transistor.

● 240

Machine Learning with Python 220224 UK.indd 240

23/03/2022 11:53

Chapter 18 • Electronic Components

The figure shows the different representations of a transistor. At the top left is a 3-dimensional view. This is the standard representation of a transistor in an assembly diagram. The letter N indicates that it is a so-called npn transistor. This designation refers to the internal structure of the transistor [13]. Below is a top view. This is advantageous if the 3-D representation takes up too much space. In addition, the individual connections (B, C, E) are also recognizable in the symbol in this representation. To the right is the circuit diagram. Finally, on the far right is a photo of a BC548 transistor type. The marking can be found on the flattened side of the transistor. This flattening also indicates the direction of installation. In the 3-D image, it is marked "N". The flattened side can also be seen in the top view. Individual transistors are sensitive components that break down even at small overloads. From the outside, it is impossible to tell whether the transistor is still usable or defective. In case of doubt, a transistor that does not work as expected must be replaced. If a small base current goes into a functioning transistor, a current can also flow across the main path (emitter-collector). The stronger the base current, the stronger the emitter-collector current. In this way, a very small current can be used to control a large one. This makes transistors ideally suited for controlling larger loads with small currents. Using transistors, a small current drawn from a Raspberry or MaixDuino pin, can easily control the fan motor of the ChatBot in Chapter 14.5.

18.6 Sensors The DHT-11/22 is a relatively inexpensive temperature and humidity sensor. The air humidity is recorded in both types via a capacitive measuring method. A thermistor is used to measure the air temperature. The transducer provides an internally generated digital signal on its data pin, so there is no need to use an analog input. However, the module only provides a new value every two seconds. Since room temperature and air humidity usually do not change very quickly, the measuring speed should be perfectly sufficient in most cases. The following two figures show a photo of the module and the corresponding pin assignment.

Figure 18.7: DHT11/22 sensor.

● 241

Machine Learning with Python 220224 UK.indd 241

23/03/2022 11:53

Machine Learning with Python

The DHT11 requires a so-called pull-up resistor of approx. 5 kilo-ohms. This must be connected from the data output to the positive supply voltage. This value is not critical and the sensor usually works with a 10 kilo-ohm pull-up. If in doubt, two 10 kilo-ohm resistors can also be connected in parallel. In some cases, the pull-up can be omitted, as internal pullups in the controller used in each case take over the function.

18.7 Ultrasound range finder Ultrasonic transducers are used in many applications such as parking assistance systems for cars, in robotics or in ultrasonic remote controls. The use of ready-made modules is usually the most cost-effective solution. A popular version is the US-020 module. This is available from many electronics companies. In addition to the transmitter and receiver, the module also contains all the necessary evaluation electronics. A compatible component is also available under the designation HC-SR04. In addition to ground (GND) and supply voltage, the module has the two connections "Trig" for the trigger input and "Echo". Trig is used to start an ultrasonic pulse train; the echo pin provides information on the received ultrasound signal. The following four connections are required to connect the module to the Raspberry Pi: US-020

Raspberry Pi

GND

Ground

Vcc

3.3 V

Trig

18

Echo

20

Figure 18.8: Ultrasound module. When purchasing the module, make sure that it also works with 3.3 V (input voltage: 3.3–5 V), which obviates a level converter.

● 242

Machine Learning with Python 220224 UK.indd 242

23/03/2022 11:53

Chapter 19 • Troubleshooting

Chapter 19 • Troubleshooting The following list shows the most common causes of errors in breadboard assemblies. Assuming a circuit is carefully checked in the event of an error, there is a high probability that it will work properly in the end. 1. Hardware - Are all polarized components inserted in the correct orientation? Check LEDs and transistors in particular! - Make sure that bare connecting wires of components do not touch each other unintentionally. - Are all resistors correctly selected? In poor lighting conditions, the color rings (specifically red and orange) can easily be mixed up. - "Reverse engineering": If a circuit refuses to work at all, it can be very useful if you draw your own circuit diagram according to the actual setup. This often makes the assignment of the individual components clearer. - Is the USB or programming cable connected correctly? 2. Errors can also occur during programming or upload. The following general actions can be helpful here: -

reload the MicroPython system; press the Boot or Enable buttons on the MaixDuino; restart Thonny; disconnect and reconnect your USB link.

More specific and detailed information on troubleshooting can be found in the individual chapters.

● 243

Machine Learning with Python 220224 UK.indd 243

23/03/2022 11:53

Machine Learning with Python

Chapter 20 • Buyers Guide The two embedded boards mainly used in this book, the Raspberry Pi and the MaixDuino, are readily available on the market: • The MaixDuino can be ordered from the Elektor Store as a complete kit with display and camera. • Raspberry Pi boards and other electronic components, parts and modules are available from the major electronics companies: -

Elektor Store: www.elektor.com Amazon eBay Aliexpress and others

These are all good sources for component assortment boxes, product bundles with various interesting components, and modules.

● 244

Machine Learning with Python 220224 UK.indd 244

23/03/2022 11:53

Chapter 21 • References; Bibliography

Chapter 21 • References; Bibliography More in-depth information and many applications on the topic of Python and MicroPython can be found in the book [1] MicroPython for Microcontrollers, G. Spanner, Elektor 2020. Further suggestions and details on individual chapters can be found in various editions of ELV Journal (available in German only): [2] [3] [4] [5] [6] [7]

Machine Learning und Neuronale Netze, ELV 3/21 Start in die KI-Praxis, ELV 4/21 Maschinelles Handschriftenlesen, ELV 5/21 Spracherkennung und Sprachsynthese, ELV 6/21 Objekterkennung, ELV 1/22 Gesichtserkennung, ELV 2/22

An introduction to MicroPython is provided by the following articles in Elektor Magazine: [8] MicroPython for the ESP32 & Co. – Part 1. G. Spanner, Elektor April 2021 [9] MicroPython for the ESP32 & Co. – Part 2. G. Spanner, Elektor June 2021 The following articles in Elektor Magazine introduce the MaixDuino in general: [10] W. Trojan, AI for Beginners (1), Elektor 04/2020 [11] W. Trojan, AI for Beginners (2), Elektor 06/2020 The book: [12] Robotics and Artificial Intelligence, G. Spanner, Elektor 2019 offers many applications and practical examples of machine-learning techniques and artificial intelligence in robotics. Supplementary information on the subject of electronics in general can be found in the following (German-only) e-books. This series is continuously updated and expanded: [13] E-Book Elektronik! – Transistors: www.amazon.de/dp/B00OXNCB02 [14] E-book Elektronik! – Audio: www.amazon.de/dp/B013NSPPY6 [15] E-book Elektronik! – Measurement and Instrumentation: www.amazon.de/dp/B0753GXHVP

● 245

Machine Learning with Python 220224 UK.indd 245

23/03/2022 11:53

Machine Learning with Python

Index 20class.kmodel

166

A accuracy Acoustic information Adam AI accelerators Alan Turing Anaconda and Android Arrays Artificial Intelligence Artificial Intelligence of Things Automatic person identification

147 172 127 35 16 41, 58 86 30 85 12 35 204

B Batch size bool boot.py Boot/RST button bottle detector

150 83 46 50 169

C Caffe camera comments complex Confusion matrix

218 167 74 83 130

const convNets Convolution convolutional networks Cortana

83 142 142 142 16

D del() DHT11 DHT22 display door spy

83 177 177 76 198

E electronic components

224

ELIZA Environments epochs ESpeak-NG eyes

16 59 150 176 206

F face recognition Face-recognition Facial identification Facial recognition fault-tolerant FileZilla float

192 205 192 195 18 30 83

G Gabor filters GitHub Go GPIO

196 55 17 80

H h5 file handwriting heatsink human faces human speech

147 133 31 192 173

I Image Sensor indentation

167 78

Independent Component Analysis integer interactive ipynb file

196 83 57 62

J Jupyter Jupyter notebook

59 60

K Kendryte K210 kernel kflash GUI kflash tool kfpkg file

34 66 55 56 56

● 246

Machine Learning with Python 220224 UK.indd 246

23/03/2022 11:53

Index

Knowledge Processing Unit KPU

35 35

L libs Linear Discrimination Analysis logistic loss loss function

67 196 127 146 163

M Machine learning Machine Learning main.py MaixDuino MaixPy IDE Markdown MicroPython missing libraries

14 12 46 23, 34 53 64 72 68

N Natural Language Processing neuromorphic chips not

173 230 86

O or

86

P PCA PiCam prediction accuracy Principal Component Analyses processor temperature Python Python interpreter

196 195 146 196 32 19, 72 54

R Raspberry Pi Rectified Linear Unit Reinforcement learning relu ReLu remote desktop robots

23 119 15 127 119 28 192

S semi-supervised learning Siri sleep sloth syndrome smartphones smile solver soundwaves speaking eye speech synthesizers Spyder Stop / Restart super intelligence Supervised learning

15 16 79 160 30 206 127 15 189 172 41, 59, 68 51 13 15

T TensorFlow test data text-to-speech synthesis TFLite Thonny Thonny IDE training TTS

68, 154 126 173 157 41 44 14, 126 173

U ultrasound module undervoltage symbol USB-C USB port

189 40 27, 39 38

V variable manager Variables verbose virtual environments

70 82 127 155

X XOR logic

17

Y YOLO You Only Look Once

228 228

● 247

Machine Learning with Python 220224 UK.indd 247

23/03/2022 11:53

books books books books

Nahezu alle Menschen werden zunehmend mitapplications den Anwendungen Most people are increasingly confronted with the of Artificial der „Künstlichen (KI oder AI für engl. Artificial Intelligence) Intelligence (AI).Intelligenz“ Music or video ratings, navigation systems, shopping konfrontiert. Musikoder Videoempfehlungen, Navigationssysteme, advice, etc. are based on methods that can be attributed to this field. Einkaufsvorschläge etc. basieren auf Verfahren, die diesem Bereich zugeordnet werdenIntelligence können. The term Artificial was coined in 1956 at an international conference known as the Dartmouth Summer Research Project. One Der Begriff „Künstliche 1956 auf einer internationalen basic approach was toIntelligenz“ model thewurde functioning of the human brain and Konferenz, Dartmouth Summer Research Eine to constructdem advanced computer systems based Project on this. geprägt. Soon it should grundlegende Idee war dabei, Funktionsweise be clear how the human mind die works. Transferring itdes to amenschlichen machine was Gehirns zuonly modellieren basierend considered a small step.und Thisdarauf notion proved to be afortschrittliche bit too optimistic. Computersysteme zu konstruieren. Bald sollteitsklar sein, wiecalled der Nevertheless, the progress of modern AI, or rather subspecialty menschliche Verstand funktioniert. Die Übertragung Machine Learning (ML), can no longer be denied. auf eine Maschine wurde nur noch als ein kleiner Schritt angesehen. Diese Vorstellung erwies etwasdiff zuerent optimistisch. sind In this sich book,als several systems Dennoch will be used todie getFortschritte to know the der modernen KI, beziehungsweise ihrem Teilgebiet demtosogenannten methods of machine learning in more detail. In addition the PC, both „Machine Learning“, mehr zu übersehen. the Raspberry Pi andnicht the Maixduino will demonstrate their capabilities in the individual projects. In addition to applications such as object and Um dierecognition, Methoden des Machine Learnings näher kennenzulernen, sollen facial practical systems such as bottle detectors, person incounters, diesem Buch mehrereeye” verschiedene or a “talking will also beSysteme created.zum Einsatz kommen. Neben dem PC werden sowohl der Raspberry Pi als auch der „Maixduino“ inThe den einzelnen Projekten ihre Fähigkeiten zu latter is capable of acoustically describingbeweisen. objects orZusätzlich faces that are Anwendungen wie ObjektGesichtserkennung detected automatically. Forund example, if a vehicle is inentstehen the field ofdabei view auch einsetzbare Systeme wie etwa Flaschendetektoren, of thepraktisch connected camera, the information "I see a car!" is output via Personenzähler oder ein „Sprechendes electronically generated speech. SuchAuge“. devices are highly interesting examples of how, for example, blind or severely visually impaired people Letzteres ist in derfrom Lage, erkannte Objekte oder Gesichter can also benefit AI automatisch systems. akustisch zu beschreiben. Befindet sich beispielsweise ein Fahrzeug im Sichtfeld der angeschlossenen Kamera, so wird die Information „I see a car!“ über elektronisch erzeugte Sprache ausgegeben. Derartige Geräte sind hochinteressante Beispiele dafür, wie etwa auch blinde oder stark sehbehinderte Menschen von KI-Systemen profitieren können.

Dr. seit Dr.Günter GünterSpanner Spannerist has been über 20 Jahren Bereich der working in the im field of electronics Elektronikentwicklung und der development and physics Physikalischen technology for Technologie various largefür verschiedene corporations Großkonzerne for more than 20 tätig. Neben seinertoTätigkeit years. In addition his workals Dozent hat er sehr erfolgreich as a lecturer, he has published Fachartikel Bücher zum Thema successful und technical articles and Elektronik, Halbleitertechnik und books on electronics, semiconductor Mikrocontroller entlicht sowie technology andveröff microcontrollers, Kurse undcreated Lernpakete zu diesen and has courses and Themen learningerstellt. packages on these topics.

Elektor GmbH Media. b.v. ElektorVerlag International www.elektor.de www.elektor.com

Machine Learning with Python • Günter Spanner

Machine Machine Learning Learning mit withPython Python für ForPC, PC,Raspberry RaspberryPiPi, und and Maixduino Maixduino

Günter Spanner • Machine Learning mit Python

books

Machine Python Machine Learning Learningwith mit Python For PC, PC, Raspberry Raspberry Pi, für Pi and Maixduino Maixduino und

Raspberry Pi

Maixduino

Günter Spanner

Machine Learning with Python.indd Alle pagina's

10-03-2022 14:28