Deep Learning Essentials: Your hands-on guide to the fundamentals of deep learning and neural network modeling (English Edition) 9781785880360

Get to grips with the essentials of deep learning by leveraging the power of Python Key FeaturesYour one-stop solution t

2,585 501 15MB

English Pages 284 [271] Year 2018

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Deep Learning Essentials: Your hands-on guide to the fundamentals of deep learning and neural network modeling (English Edition)
 9781785880360

Table of contents :
Cover
Title Page
Copyright and Credits
Packt Upsell
Contributors
Table of Contents
Preface
Chapter 1: Why Deep Learning?
What is AI and deep learning?
The history and rise of deep learning
Why deep learning?
Advantages over traditional shallow methods
Impact of deep learning
The motivation of deep architecture
The neural viewpoint
The representation viewpoint
Distributed feature representation
Hierarchical feature representation
Applications
Lucrative applications
Success stories
Deep learning for business
Future potential and challenges
Summary
Chapter 2: Getting Yourself Ready for Deep Learning
Basics of linear algebra
Data representation
Data operations
Matrix properties
Deep learning with GPU
Deep learning hardware guide
CPU cores
RAM size
Hard drive
Cooling systems
Deep learning software frameworks
TensorFlow – a deep learning library
Caffe
MXNet
Torch
Theano
Microsoft Cognitive Toolkit
Keras
Framework comparison
Setting up deep learning on AWS
Setup from scratch
Setup using Docker
Summary
Chapter 3: Getting Started with Neural Networks
Multilayer perceptrons
The input layer
The output layer
Hidden layers
Activation functions
Sigmoid or logistic function
Tanh or hyperbolic tangent function
ReLU
Leaky ReLU and maxout
Softmax
Choosing the right activation function
How a network learns
Weight initialization
Forward propagation
Backpropagation
Calculating errors
Backpropagation
Updating the network
Automatic differentiation
Vanishing and exploding gradients
Optimization algorithms
Regularization
Deep learning models
Convolutional Neural Networks
Convolution
Pooling/subsampling
Fully connected layer
Overall
Restricted Boltzmann Machines
Energy function
Encoding and decoding
Contrastive divergence (CD-k)
Stacked/continuous RBM
RBM versus Boltzmann Machines
Recurrent neural networks (RNN/LSTM)
Cells in RNN and unrolling
Backpropagation through time
Vanishing gradient and LTSM
Cells and gates in LTSM
Step 1 – The forget gate
Step 2 – Updating memory/cell state
Step 3 – The output gate
Practical examples
TensorFlow setup and key concepts
Handwritten digits recognition
Summary
Chapter 4: Deep Learning in Computer Vision
Origins of CNNs
Convolutional Neural Networks 
Data transformations
Input preprocessing
Data augmentation
Network layers
Convolution layer
Pooling or subsampling layer
Fully connected or dense layer
Network initialization
Regularization
Loss functions
Model visualization
Handwritten digit classification example
Fine-tuning CNNs
Popular CNN architectures
AlexNet
Visual Geometry Group
GoogLeNet
ResNet
Summary
Chapter 5: NLP - Vector Representation
Traditional NLP
Bag of words
Weighting the terms tf-idf
Deep learning NLP
Motivation and distributed representation
Word embeddings
Idea of word embeddings
Advantages of distributed representation
Problems of distributed representation
Commonly used pre-trained word embeddings
Word2Vec
Basic idea of Word2Vec
The word windows
Generating training data
Negative sampling
Hierarchical softmax
Other hyperparameters
Skip-Gram model
The input layer
The hidden layer
The output layer
The loss function
Continuous Bag-of-Words model
Training a Word2Vec using TensorFlow
Using existing pre-trained Word2Vec embeddings
Word2Vec from Google News
Using the pre-trained Word2Vec embeddings
Understanding GloVe
FastText
Applications
Example use cases
Fine-tuning
Summary
Chapter 6: Advanced Natural Language Processing
Deep learning for text
Limitations of neural networks
Recurrent neural networks 
RNN architectures
Basic RNN model
Training RNN is tough
Long short-term memory network
LSTM implementation with tensorflow
Applications
Language modeling
Sequence tagging
Machine translation
Seq2Seq inference
Chatbots
Summary
Chapter 7: Multimodality
What is multimodality learning?
Challenges of multimodality learning
Representation
Translation
Alignment
Fusion
Co-learning
Image captioning
Show and tell
Encoder
Decoder
Training
Testing/inference
Beam Search
Other types of approaches
Datasets
Evaluation
BLEU
ROUGE
METEOR
CIDEr
SPICE
Rank position
Attention models
Attention in NLP
Attention in computer vision
The difference between hard attention and soft attention
Visual question answering
Multi-source based self-driving
Summary
Chapter 8: Deep Reinforcement Learning
What is reinforcement learning (RL)? 
Problem setup
Value learning-based algorithms
Policy search-based algorithms
Actor-critic-based algorithms
Deep reinforcement learning
Deep Q-network (DQN)
Experience replay
Target network
Reward clipping
Double-DQN
Prioritized experience delay
Dueling DQN
Implementing reinforcement learning
Simple reinforcement learning example
Reinforcement learning with Q-learning example
Summary
Chapter 9: Deep Learning Hacks
Massaging your data
Data cleaning
Data augmentation
Data normalization
Tricks in training
Weight initialization
All-zero
Random initialization
ReLU initialization
Xavier initialization
Optimization
Learning rate
Mini-batch
Clip gradients
Choosing the loss function
Multi-class classification
Multi-class multi-label classification
Regression
Others
Preventing overfitting
Batch normalization
Dropout
Early stopping
Fine-tuning
Fine-tuning
When to use fine-tuning
When not to use fine-tuning
Tricks and techniques
Model compression
Summary
Chapter 10: Deep Learning Trends
Recent models for deep learning
Generative Adversarial Networks 
Capsule networks 
Novel applications
Genomics
Predictive medicine
Clinical imaging
Lip reading
Visual reasoning
Code synthesis
Summary
Other Books You May Enjoy
Index

Citation preview

Deep Learning Essentials

:PVSIBOETPOHVJEFUPUIFGVOEBNFOUBMTPGEFFQMFBSOJOH BOEOFVSBMOFUXPSLNPEFMJOH

Wei Di Anurag Bhardwaj Jianing Wei

BIRMINGHAM - MUMBAI

Deep Learning Essentials Cothgiryp

2018PacktPublishng

Allrightsevd.Nopartofhisbkmyeduc,storedinavlym,ortansmiedyf orbyanmes,withoueprnmsfbl,excptinhasofbrqu embdincrtalsovw . Everyfothasbnmdipkuc. Howevr,theinformacdsbklwuy,eithrxpsomld.Neither authors,norPacktPublishngortdeau,wilbehdafornymgscut havebncusdirtlyok . PacktPublishngaedvortpmkfuc mentiodhsbkyaprufcl.Howevr,PacktPublishngcaoturey ofthisnrma . Commissioning Editor:VeenaPagare Acquisition Editor:AmanSingh Content Development Editor:SnehalKolte Technical Editor:SayliNikalje Copy Editor:SafisEditng Project Coordinator:ManthaPatel Proofreader:SafisEditng Indexer:FrancyPuthiry Graphics:TaniaData Production Coordinator:ArvindkumarGupta Firstpublihed:January2018 Productinref:1250118 PublishedyPacktPublishngLtd. LiveryPlace 35LiveryStret Birmingha B32PB,UK. ISBN978-1-78588-036-0

XXXQBDLUQVCDPN

NBQUJP

Maptisanoledgbryhvufc5,000boksandvie,as welasindutrygohpvmc yourcae.Formoeinfat,pleasvitourwb.

Why subscribe? SpendlstimargocwhpBoksandVideos fromve4,000industrypofeal ImproveyulanigwthSkilPlansbuiltepcyfor GetafreBokrvideymnth Maptisfulyearchb Copyandste,print,andbokmrcte

PacktPub.com DidyouknwthaPacktofersBokversinfybpulhd,withPDFand ePubfilesav?YoucanpgrdetohBokversinat printbokcusme,youarentildschBokcpy.Getinouchws ta TFSWJDF!QBDLUQVCDPNformedtails. At XXX1BDLU1VCDPN,youcanlsredtifh,signupfora rangeofwslt,andrecivxlusotfPacktbosand eBo ks.

XXX1BDLU1VCDPNands

Contributors About the authors Wei Diisadtcenwhmyr,exprincmahlgdtf intelgc.Sheispaontbucrgmdlh canimptlosfdvuewrb.Curently,she worksatfdcienLinkedIn.ShewasprvioulyctdhBay HumanLanguaeTechnolgyteamdBayResearchLabs.Priortha,shewait Ancestry.com,workinglae-scaledtminghrofk.She recivdhPhDfromPurdueUniverstyn2011.

Anurag BhardwajcurentlyadshifoWiserSolutions,wher focusentrigal-comerinvty.Heispartculyned usingmachelrtovpbdyf matching,aswelvrioutdpbmn-comer.Previously,heworkdn imageundrstBayResearchLabs.AnuragecivdhsPhDandmster'sfrom theStateUniverstyofNewYorkatBufalondhsBTechinomputerg fromtheNationalInstiueofTechnolgy,Kuruksheta,India.

Jianing WeiisaenorftwgGoogleResearch.Heworksintheaf computervisnadlg.PriortjngGooglein2013,heworkdat SonyUSResearchCenterfo4yearsinthfldo3Dcomputervisnadg procesing.JianigobtedhsPhDinelctradompugfPurdue Universtyn2010.

About the reviewer Amita KapoorisAssociateProfesrinthDepartmenofElectronis,SRCASW,Universty ofDelhi.Shedibothrmas'sandPhDinelctros.DuringhePhD,shewa awrdethpsigouDAADfelowshipturackn KarlsuheInstiueofTechnolgy,Germany.ShewonthBestPresntaioAwardthe 2008InternaiolConfercoPhotnicsfrhepa.Sheisambrofpnl bodiesnclugthOpticalSocietyofAmerica,theInternaiolNeuralNetworkSociety, theIndianSocietyforBudhistStudies,andtheIEEE.

Packt is searching for authors like you Ifyou'reintsdbcomgauhfPackt,pleasvit aplytod.Wehaveworkditusnflpc,just likeyou,tohelpmsaringwbcuy.Youcanmke genralpicto,aplyforsecihtwung,or submityorwndea .

BVUIPSTQBDLUQVCDPNand

Table of Contents Preface Chapter 1: Why Deep Learning? What is AI and deep learning? The history and rise of deep learning Why deep learning? Advantages over traditional shallow methods Impact of deep learning The motivation of deep architecture The neural viewpoint The representation viewpoint Distributed feature representation Hierarchical feature representation

Applications Lucrative applications Success stories Deep learning for business Future potential and challenges Summary

Chapter 2: Getting Yourself Ready for Deep Learning Basics of linear algebra Data representation Data operations Matrix properties Deep learning with GPU Deep learning hardware guide CPU cores RAM size Hard drive Cooling systems

Deep learning software frameworks TensorFlow – a deep learning library Caffe MXNet Torch

1 7 8 10 16 16 18 20 21 22 23 25 26 27 27 34 35 37 38 39 39 40 41 43 44 45 45 46 46 47 47 48 48 49

Theano Microsoft Cognitive Toolkit Keras Framework comparison Setting up deep learning on AWS Setup from scratch Setup using Docker Summary

Chapter 3: Getting Started with Neural Networks Multilayer perceptrons The input layer The output layer Hidden layers Activation functions Sigmoid or logistic function Tanh or hyperbolic tangent function ReLU Leaky ReLU and maxout Softmax Choosing the right activation function

How a network learns Weight initialization Forward propagation Backpropagation Calculating errors Backpropagation Updating the network Automatic differentiation

Vanishing and exploding gradients Optimization algorithms Regularization Deep learning models Convolutional Neural Networks Convolution Pooling/subsampling Fully connected layer Overall

Restricted Boltzmann Machines Energy function Encoding and decoding Contrastive divergence (CD-k) Stacked/continuous RBM

[ ii ]

49 50 50 51 52 52 56 58 59 60 61 61 61 61 63 63 64 64 65 65 65 65 66 66 67 67 68 68 69 70 70 71 71 72 74 75 75 76 77 77 80 81

RBM versus Boltzmann Machines

Recurrent neural networks (RNN/LSTM) Cells in RNN and unrolling Backpropagation through time Vanishing gradient and LTSM Cells and gates in LTSM Step 1 – The forget gate Step 2 – Updating memory/cell state Step 3 – The output gate

Practical examples TensorFlow setup and key concepts Handwritten digits recognition Summary

Chapter 4: Deep Learning in Computer Vision Origins of CNNs Convolutional Neural Networks Data transformations Input preprocessing Data augmentation

Network layers Convolution layer Pooling or subsampling layer Fully connected or dense layer

Network initialization Regularization Loss functions Model visualization Handwritten digit classification example Fine-tuning CNNs Popular CNN architectures AlexNet Visual Geometry Group GoogLeNet ResNet Summary

Chapter 5: NLP - Vector Representation Traditional NLP Bag of words Weighting the terms tf-idf Deep learning NLP

81 81 82 82 83 84 85 85 85 85 86 86 90 91 91 93 95 96 97 98 99 100 101 102 103 105 106 108 111 112 113 113 114 114 115 116 116 117 118 119

[ iii ]

Motivation and distributed representation Word embeddings Idea of word embeddings Advantages of distributed representation Problems of distributed representation Commonly used pre-trained word embeddings

Word2Vec Basic idea of Word2Vec The word windows Generating training data Negative sampling Hierarchical softmax Other hyperparameters Skip-Gram model The input layer The hidden layer The output layer The loss function Continuous Bag-of-Words model Training a Word2Vec using TensorFlow Using existing pre-trained Word2Vec embeddings Word2Vec from Google News Using the pre-trained Word2Vec embeddings

Understanding GloVe FastText Applications Example use cases Fine-tuning Summary

Chapter 6: Advanced Natural Language Processing Deep learning for text Limitations of neural networks Recurrent neural networks RNN architectures Basic RNN model Training RNN is tough Long short-term memory network LSTM implementation with tensorflow Applications Language modeling Sequence tagging Machine translation

[ iv ]

119 120 121 123 124 124 126 126 127 128 129 130 131 131 131 132 132 132 133 134 139 139 139 140 141 142 142 142 142 143 144 144 146 147 148 149 151 153 156 156 158 160

Seq2Seq inference

163 165 165

Chatbots Summary

Chapter 7: Multimodality

166

What is multimodality learning? Challenges of multimodality learning Representation Translation Alignment Fusion Co-learning Image captioning Show and tell

Visual question answering Multi-source based self-driving Summary

166 167 167 168 168 169 169 170 171 172 173 173 173 174 175 176 180 181 182 182 183 183 183 183 184 187 189 190 193 196

Chapter 8: Deep Reinforcement Learning

197

Encoder Decoder Training Testing/inference Beam Search

Other types of approaches Datasets Evaluation BLEU ROUGE METEOR CIDEr SPICE Rank position

Attention models Attention in NLP Attention in computer vision The difference between hard attention and soft attention

What is reinforcement learning (RL)? Problem setup Value learning-based algorithms Policy search-based algorithms Actor-critic-based algorithms Deep reinforcement learning

[v]

198 198 199 201 202 203

Deep Q-network (DQN) Experience replay Target network Reward clipping

Double-DQN Prioritized experience delay

Dueling DQN Implementing reinforcement learning Simple reinforcement learning example Reinforcement learning with Q-learning example Summary

204 205 205 206 206 207 208 209 209 211 213

Chapter 9: Deep Learning Hacks

214

Massaging your data Data cleaning Data augmentation Data normalization Tricks in training Weight initialization

214 214 215 215 216 216 216 216 217 217 218 218 220 220 220 220 221 221 221 221 222 222 223 223 223 223 224 224 225 230

All-zero Random initialization ReLU initialization Xavier initialization

Optimization Learning rate Mini-batch Clip gradients

Choosing the loss function Multi-class classification Multi-class multi-label classification Regression Others

Preventing overfitting Batch normalization Dropout Early stopping Fine-tuning

Fine-tuning When to use fine-tuning When not to use fine-tuning Tricks and techniques Model compression Summary

[ vi ]

Chapter 10: Deep Learning Trends

231

Recent models for deep learning Generative Adversarial Networks Capsule networks Novel applications Genomics Predictive medicine Clinical imaging Lip reading Visual reasoning Code synthesis Summary

Other Books You May Enjoy Index

231 231 233 234 234 238 239 240 242 244 247 248 251

[ vii ]

Preface Deeplarnigsthmoduvcw,havingjumpedotfrsc labortiesghnpducvm.Itisthecnadrofwkg througidenlaysfp.Deeplarnigscutyofh bestprovidflunmagc,spechrognit,object recognit,and We lstarofbyuhingpmcedqk deplarnigtsmo.Movingo,we ofneuraltwksdhipc.Withelpofinsgu exampls,you otherimpancsudlf.

natural language processing(NLP). lteachyoubdifrnps learntocgizpsudwk

Usingtherfocmlaquwdp,you outperfmanyhdlswkiLSTMnetwork.Duringthecousf bok,youwilcmearsdngftkb,sucha TensorFlow,Python,Nvida,andothers.Bythendofbk,you productin-readyplnigfmwokucts.

lbuidAIthacn

lbeatodpy

Who this book is for Ifyouarenspigdtc,deplarnigthus,orAIresachlokingt buildthepowrfangysc,thenisbokprfc resoucfytadingAIchalengs. Togethmosufibk,youmsthaveinrdPythonskiladbefmr withmacnelrgopsdv.

What this book covers $IBQUFS,

historyfdeplang,itsre,anditsrecvfl.Wewilaso describomfthalng,aswelitfurpon.

,providesanwflg.Webeginwth

$IBQUFS,

,isatrngpoelfu

exprimntgwhadlycquso.Wewil answerthkyquioldcpg learnig.Wewilcoversmbanptfg,theardwquimns fordeplanigmt,aswelomfitpurk.We wilasotkengupdrymfch-based GPUinstace. $IBQUFS, ,focusenthbairlwk, includgpt/outplayers,hidenlayrs,andhowetrkslugf backprogtin.Wewilstarhendmuypcok buildngocks,andilustrehowypb.Wewilasontrducef po ularst nd m e ,sucha Convolutional Neural Networks(CNNs),Restricted Boltzmann Machines(RBM),recurrent neural networks(RNNs),aswelvritonf themiscald Long Short-Term Memory(LSTM). $IBQUFS,

,explainsCNNsinmoredtal.Wewilgover

thecornpsailwkgfCNNsandhowteycbus solvera-worldcmputevisnb.WewilasoktmefhpurCNN architeusndmplbCNNusingTensorFlow. $IBQUFS,

,coversthbaifNLPfordeplanig.This

chapterwildsboumngqf repsntaioNLP.ItwilasocverpumdhWord2Vec,Glove,and FastText.ThischapterlonudxmfbgTensorFlow. $IBQUFS,

,takesmordl-centriapoh

texprocsing.Wewilgoversmfthcd,suchaRNNsandLSTM networks.WewilmpentasLSTMusingTensorFlowandescribth foundatilrchebmysxpg LSTM. $IBQUFS,

,introducesmfalpghy

usingdeplar.Thischapterlomnv,advncemultio aplictonsfderg . $IBQUFS,

,coversthbaifnmlg.It

ilustraehowdpngcbmvf. Thischaptergoubmlnfd usingTensorFlowandiscumeftpr.

[2]

$IBQUFS,

,empowrsadbyvingctlh

canbeplidwhusgr,suchatebprifonwkg intalzo learnigpmtu,howtprevnfig,andhowtpre , yourdatfbelnigwhcs . $IBQUFS,

,sumarizeofthpcngd

learnig.Itloksatmefhupcingrdwyv,aswel somefthnwaplicdrg .

To get the most out of this book Theracouplfthingsydmbk.Firstly,its recomndtalshvbikwgfPythonprgamidce learnig . Secondly,beforpcdingt besurtoflwhpinc

$IBQUFS, $IBQUFS,

.Youwilasobetpyrnvmgc thegivnxampls. Thirdly,familrzedyouswthTensorFlowandreitscum.The TensorFlowdcumentai( informatdlscgexp.Youcanlso lokarundie,asthervioupncxmld-learnig-relatd resou c .

IUUQTXXXUFOTPSGMPXPSHBQJ@EPDT)isagretoucf

Fourthly,makesuryoxplnw.Trydiferntsgocua simplerobthadn'trequimchopanl;thiscanelpyouqk getsomidafhwlrknup . Lastly,diveprntoachyfml.Thisbokexplanthgfvrud learnigmodspwhvtuc;thegoalispyu understahmcioflwk.Whilethracuny manydifertolspubcvhg-levAPIs,agodunersti ofdeplanigwrtyhubmvc .

[3]

andothers,

Download the example code files Youcandowlthexmpfisrbky XXXQBDLUQVCDPN.Ifyouprchasedtibklw,youcanvist XXXQBDLUQVCDPNTVQQPSUandregistohvflmcyu.

Youcandowlthefisbygp: 1.Loginoresta 2.Selecth 3.Clickon 4.Enterhnamofbki instruco.

XXXQBDLUQVCDPN. SUPPORT. bat Code Downloads & Errata. Searchboxandflwthescr

Oncethfilsdowna,pleasmkurthyonzixcfdg latesvrionf: WinRAR/7-ZipforWindows Zipeg/iZip/UnRarXforMac 7-Zip/PeaZipforLinux ThecodbunlfrthkisaGitHubat

IUUQTHJUIVCDPN 1BDLU1VCMJTIJOH%FFQ-FBSOJOH&TTFOUJBMT.Wealsohvetrcdbunfm IUUQTHJUIVCDPN1BDLU1VCMJTIJOH. richatlogfbksndve

Checkthmou!

Download the color images WealsoprvidePDFfilethascormgn/diagrmsuenth bok.Youcandowlither:

IUUQTXXXQBDLUQVCDPNTJUFTEFGBVMUGJMFT EPXOMPBET%FFQ-FBSOJOH&TTFOUJBMT@$PMPS*NBHFTQEG.

[4]

Conventions used Theranumboftxcvisdhgk. $PEF*O5FYU:Indicatesowrnx,datbselnm,foldernams,filenams, filextnso,pathnmes,dumyURLs,userinpt,andTwiterhandls.Hereisan BMQIBisthelarng, exampl:"Inaditon, 8sthewigmarx.Thesamplingfucto biasofthednlyr,and theGibs-Samplingfuctoadeswhr."

WCisthebaofvlyr,

Ablockfdeistaw: JNQPSUNYOFUBTNY UFOTPS@DQVNYOE[FSPT

 DUYNYDQV UFOTPS@HQVNYOE[FSPT

 DUYNYHQV 

Anycomand-lineputorswaf:

Bold:Indicatesnwrm,animportwd,orwdsthayuenc. Warnigsomptaelkh.

Tipsandtrickelh.

Get in touch Feedbackfromusilwy. General feedback:Email GFFECBDL!QBDLUQVCDPNandmetiohbkl subjectofyrmag.Ifyouhaveqstinbpcfk,pleasmi RVFTUJPOT!QBDLUQVCDPN. usat

[5]

ICisthe TBNQMF@QSPCis

Errata:Althougweavknrycsf,mistake dohapen.Ifyouhavefndmistkb,weouldbgratfiy repothisu.Pleasvit clikngotheErratSubmisonFormlink,andetrighls. Piracy:IfyoucmearsnilgpfwkthIntern,we wouldbegratfiypvshcnm. Pleascontu

XXXQBDLUQVCDPNTVCNJUFSSBUB,selctingyourbk,

DPQZSJHIU!QBDLUQVCDPNwithalnkoemr.

If you are interested in becoming an author:Iftherisaopcyuvxn andyoureitshwgcbk,pleasvit BVUIPSTQBDLUQVCDPN.

Reviews Pleaslvriw.Onceyouhavrdnstibk,whynotleavri thesiayouprcdfm?Potenialrdschuyob opintmakeurchsd,weatPacktanuderswhyoib products,andourthsceyfbki.Thankyou! FormoeinfatbuPackt,pleasvit

QBDLUQVCDPN.

[6]

Why Deep Learning? Thischapterwlgvnofd,theisoryfdplang,therisof deplarnig,anditsrecvfl.Also,weiltakbouchngs, aswelitfurpon . Wewilanserfkyqutodbpcgh maynotpsechilrgbkud.Thesqutioncld: Whatis What

artificial intelligence(AI)andeplrig? stheioryfdplangAI? Whatrehmjobkugsfdplni? Whatishemnrofc?

What

sthemoivanfdprcu? Whyshouldwertpanigc'thexisng machinelrgotsvpbd? Inwhicfeldsantbp? Sucesfultoridpang

What

stheponialfurdgwc?

What is AI and deep learning? Thedramofctingslhuvx. Whilemostfhaprnc,overcntdaswhgulyb makingproesctulybdhfj likeahumn.Thisanrecld betracdkoPamelaMcCorduck asn

artificial intelligence.ThebginofAIcanperhs sbok, .

DeeplarnigsbchofAI,witheamspcfdovnglr toisrgnal:AI. Thepathiursnmocvylfx, whicsternkly80%ofthebrainwkgcus.Inahumnbri,thera around100biloneursad100~1000trilonsyape. Itlearnshictudvofpb theparnsofdcmviuy,suchaimge,videos,sound, andtex . Higherlvabstciondfmpw-levabstrcion.Itis cal ed deepbecausithmornfl.Oneof thebigsadvnofplryumc repsntaiomulvfbc.Thisalowytemrncpx functiosmapedrhwy human-craftedus.Also,itprovdeshnalf-traing,whicslearng therpsnaiofvlbd,thenaplyigrdso toherdmains.Thismayhveoltn,suchabeingltoqrd enoughqalitydfr.Also,deplarnigfomswh largemountfspvidyh .

[8]

,whersdcibAI

Thefolwingurshampd

Convolutional Neural Network(CNN):

Thedplarnigmo,thais,thelarndpuwokfcsim layers.Togethrywokiaclbudnmpvfs.Thefirstlay learnsthfiodu,suchaolrndeg.Thescondlayrhig-orde featurs,suchaorne.Thethirdlaynsboumpcx.Layersoftn learniuspvdmocgfth.Thenth finalyertusdopvcmhk,suchalifton oregsin . Betwenlayrs,nodesarcthugwi.Eachnode,whicanbe senaimultdocrx,isaoctedwhnvfu,wheritsnpua fromthelwaynds.Buildngsuchare,multi-layersofnu-like informatlws,howevr,adec-oldiea.Fromitscreanu,it hasexprincdbotkug . Withenwsimprovaclfu,increasglypowfu computers,andlrge-scaledt,finaly,springaoudthec.Deeplarnighs becomapilrftdy nextscio,weiltracshoyndupfb journey .

stechworldanbpigf.Inthe

[9]

The history and rise of deep learning Thearlistnuwokdvph1940s,notlgaferhdwAI resach.In1943,aseminlprcd waspublihed,whicprosedtfmalnuk. Theunitofhsmdlaprz,oftenrdasMcCuloch neuro.Itisamtheclfunovdbgr,aneurl network.Theyarlmntuisfcwok.Anilustraonf artifclneuobsmhwg.Suchanideloksvrypmg inde,astheympdoiulwnbrk,butinagrelysmpfd : yaw

Pist

Thesarlymodcnitfvub cal ed weightsareusdtocnhm.Theswightdrmnoacul neurotasfimbwh thais,howeacnurspd,witha , valuebtwn0and1.Withismaeclrpno,thenuralopcf anedgorshpfmti,orapticulengyvfq phonem.Theprviousfg, matheiclyforudn,whertinpucosd, anctivofurlswhed,andthe outpcresndhax.Howevr,earlynutwokscdimv limtednubrofsac,sontmaypercbgizduh simplearchtu.Thesmodlanguihtr1970s. ilustrae

[ 10 ]

Theconptfbakrgi,theusofrinagdplm,was firstpoednh1960s.Thiswafoledbymthpncv functios.Usingaslowdmuprce,thebsaiclyonfurm layerwthnfodx.Unfortunaely,thenfirsAIwinter kicedn,whiclastedbou10years.Atthisearlyg,althougeidfmckn theumanbrisodvyfc,theaculpbisofAIprogamswevy limted.Eventhmosiprvculdyawb.Noto mentiohaydvrlcpugwsz avilbe.Thehardwintocumlybsxpg, thenwrsulfaidomzAIrecivdtsmanfugp:

Slowly,backprogtinevldsfyh1970sbutwasnopliedr networksuil1985.Inthemid-1980s,Hintoadherslpkvif ineuraltwoksh-caled neuros,thais,withmoreandlys.Anilustraonfm-layer perctonualwkbsihvfg ,

deep modelsthamdebrusofnyl

.Bythen,Hintoadheirc-authors (IUUQTXXXJSPVNPOUSFBMDB_WJODFOUQJGUMFDUVSFTCBDLQSPQ@PMEQEG) demonstrahbckpgiulw IUUQZBOOMFDVODPNFYECQVCMJT repsntaivdbuo.In1989,YanLeCun( QEGMFDVOFQEG)demonstrahfipclubkgBelLabs. Hebroughtackpinvlewsd digts,andhisevtulyomrbfw che ks.

[ 11 ]

Thisalothemf2ndAIwinter(1985-1990).In1984,twoleadingAIresach RogerSchankdMarvinMinskywaredthbuicomf AIhadspirleoutfcn.Althougmli-layerntwokscudmpi task,theirspdwavylonum.Therfo,when anothersimplbufcvd,suchaportveminw invetd,governmtaducpilshfwk. Justhreyal,thebilondarAIindustryfelap. Howevr,itwasn inmayergtchols.Despitehuandowr,fundig,and inters,somerachntiudblf.Unfortunaely,theydin'trealyok intoheaculrsfwygm-layerntwoksdifcuh theprfomancwszig.In2000,thevanisgrdpoblmwc, hicfnalydrewpo networksla?Therasonithfcvu,theinpuscod, meaniglrsofputdvxy.Withlargecnso eroscmputdfhlay,onlyasmutwiberfcdk front/lowerays.Thismeanltorgchyd learndftushywk .

trealyhfiuoAIbutmorehndfyp,whicsomn

sateniohrlkyqu:Whydon

tmuli-layer

Notehamnyuprlsfdobci repsntaivofhd.Thisgetworbcauhpmlnf uperlaymsodnthcfigw,whicmeanst optimzanfuerlysbd-optimalcnfgurwey.All ofthismeandculrwypg . Twoaprcheswdtlvibm:layer-by-layerp-traingdhe Long Short-Term Memory(LSTM)model.LSTMforecuntalwksi prosedbySepHochreitandJuergnSchmidubern1997. Inthelasdc,manyreschdofutlpbkg, andtherwsubofiplg,notlyfrmheacdis butalsofrmheindy.In2006,ProfesrHintoaTorontUniverstynCanad othersdvlpamficnwyu,caled IUUQTXXXDTUPSPOUPFEV_IJOUPOBCTQTGBTUOD QEG.).Thisparkedthconvlfuw.Inhispaer,heintroduc

Deep Belief Networks(DBNs),withalerngomdys timebyxplongausrvdhfc,a Boltzmann Machine(RBM).Thefolwingur, showtecnpflayr-by-layertingfohsdpbwk.

Restricted

[ 12 ]

TheprosdDBNwasteduinghMNISTdatbse,thesandrbfo comparingthesduyf.Thisdatbe includes70,000,28x28pixel,hand-writenchamgsofub0to9(60,000 isfortangd10,000isforteng).Thegoalistcrynwhumbf 0to9iswrtenhca.Althougepardincm time,resultfomDBMhadconsierblygptvm learnigpoch :

Fast-forwadt2012andtheirAIresachwoldkbynmt.Atthe worldcmpetinfag, Challenge(ILSVRC),atemcld

ImageNet Large Scale Visual Recognition SuperVision(IUUQJNBHFOFUPSHDIBMMFOHFT -473$TVQFSWJTJPOQEG)achievdwngtopf-tesroaf15.3%, comparedt26.2%achievdbytson-bestnry.TheImageNethasround1.2milon hig-resolutinmagb1000diferntclas.Thera10milonages provideaslngt,and150,000imagesrudfotn.Theauthors,Alex Krizhevsky,IlyaSutskevr,andGeofreyE.HintofrmTorontUniversty,builtadep convlutiaewrkh60milonparets,650,000neuros,and630milon conetis,consitgfevhdlayru,somefwhic werfoldbymax-polingayersdthfu-conetdlayrswihf1000waysoftmx.Toincreasthgd,theauorsndmlyp224x224 patchesfromvilbg.Tospeduthraing,theyusdno-saturingeo andveryfictGPUimplentaofhcvur.Theyalsoud toreducvfinghlyaspb.

[ 13 ]

Sincethdplarigskof,andtoywesmucflpi onlyimagecsft,butalsoinreg,dimensoaltyruc,texur modeling,actionreg,motindelg,objectsgmnai,informatevl, robtics,naturlgepocsi,spechrognit,biomedcalfs,music genratio,art,colabrtivefng,andso:

It sinterghawlokbc,itsemhaorclbkugd alredybnmth1980s-1990s,sowhatelcngdip?Anotto-contrvesialhy AndrewNgoncesaid:

.

Inde,fasterpocing,withGPUsprocesingtu,increasdomputl by1000timesovra10-yearspn.

[ 14 ]

Almostahei,thebigdarv.Milions,bilons,orevntilsfby ofdatreclvy.Industryleaomkingfp tolevraghmsiunfdyc.Forexampl,Baiduhas50,000 hoursftaingdpecxb 100,000 hoursfdat.Forfacileognt,200milonageswrtd.Theinvolmtf largecompnistybdhfAIoveralbypidng datsclehouryvbnimgp . Withenougraidfscmpl,neuraltwoksc extndoparchiu,whicasnevrblzdfo.Ontheonad,the ocurenfwthialps,masivedt,andfstcompuihveb progesindla.Ontheorand,thecraionfwls,platforms,and aplictonsbedmv,theusofarndmpwlGPUs, andthecolifbg.Thislopcntueadrghbm revolutinbpfhwgas : Masive,hig-quality,labedtsinvroufm,suchaimge, videos,tex,spech,audio,andso. PowerfulGPUunitsadeworkhcpblfg-point calutionspredbwy. Creationfw,deparchitus:AlexNet( ),ZeilerFergusNet ( GoogleLeNet( Ne ni ( krowt

), ),Network ),VGG( )for

,ResNets( inceptomduls,andHighwaynetorks,MXNet,Region-BasedCNNs(RCNN,

),

; ), GenerativAdversailNetworks( Opensourcftwaplm,suchaTensorFlow,Theano,andMXNet provideasy-to-use,lowevrhig-levAPIsfordevlpsacmi theyarbloquickmpnds . Approachestimvngdbl,suchaingosaturingcvoflkeReLUrathendlogisc functio s.

2014).

[ 15 ]

Approachesltvidfng: Newregulaiz,suchaDroputwhickesnra, maxout,batchnormliz. Data-augmentiohlwsrdk withou(orwithles)overfitng. Robustopimzer RMSprop,andADAMhavelpdkoutryscngf functio.

modifcatnsheSGDproceduinlgmt,

Why deep learning? Sofarwediscuhtplngoy.Butwhyiso poularnw?Inthisecon,wetalkboudvngsfpri shalowmetdnigfcpu .

Advantages over traditional shallow methods Traditonlprchesf

machinelrgdoftqu

devloprthasmiknwgcfu behlpfu,orhwtdesignfcvau.Also, hidenlayr,forexampl,asingleyrfd-forwadnetk.Incontras,deplarnigs knowasreptilg,whicasbenotprfmxg no-locandgbretishpu.Onecansuplyfirw formatsdinhelgy,forexampl,rawimgendtx,rathen extracdfusonpimg(forexampl,SIFTbyDavidLowe's

learnigoftusy

andHOGbyDalandtheirco-authors, ),orIF-IDFvectorsfx.Becauseofthdp architeu,thelarndpsiofmcuwkg learndtvious Thisparmetzd,multi-lev,computainlgrhvdes . higderoftpsna.Themphasionlwdgrt signfcatlyderhowmbu selction,whiledparngutsmofc graphtoly rchiteu)andthewysofpimzgr/hyperamts (a eficntlyadorgzbhps :

[ 16 ]

Deeplarnigothmswfbxc-locandgb relationshpd,comparedtlivyshwng architeus.Otherusflcationbpydg include: Ittriesoxplmfhabundgv,evnwh thedaisunlb. Theadvntgofciumprs. Automaicdrepsnx,fromunspevidat dat,distrbueanhcl,usalybetwhnipco struced;spatilorem Represntaioxcfmuvdbl todifernayps,suchaimge,texural,audio,andso. Relativeysmpnrodcwkfhg obtainedfrmhcplxs.This meanswithdvcfurx,thefolwingarmdcb relativysmp,whicmayelprduotnx,forexampl, inthecasoflrmdg.

forexampl,images,langue,andspech.

[ 17 ]

Relationdsemckwgbhrvf abstrciondepfhw( IUUQTKPVSOBMPGCJHEBUB source TQSJOHFSPQFODPNBSUJDMFTT).

Deeparchitusnbolyf.Thisoundctray, butisagrenfcohdpwy learnig. Thelarnigcptyofdhmsz dat,thais,perfomancisthud,wheras,for shalowrtdinegm,theprfomancslu aftercinmoudspvhwlg figure,

:

Impact of deep learning Toshowyumeftipacdlrng,let imagercontdsp.

stakelowpcifr:

Thefolwingur, ratendsfoILSVRCconteswirvhpaly.Traditonlmge recognitaphsmlyd-craftedompuvisnl numberofistachjl,forexampl,SIFT+Fishervcto.In2012,dep learnigtdhscomp.AlexKrizhevskyandProfesrHintofrmToront universtydhflwao10%dropintheaby convlutiaerwk(AlexNet).Sinceth,theladrbosncupiy thisypeofmdanvr.By2015,theroadpblwumn tesr :

showtepfivr

[ 18 ]

Thefolwingur, recognit.From2000-2009,therwasvylipog.Since2009,theinvolmf deplarnig,largedts,andfstcompuighlybev.In 2016,amjorbekthugwsdyfcni Research AI(MSR AI).Theyrpotdaschgnim fewrosthanpilc,witha otherwds,thecnolgyudrizwsavp does:

depictsrnoghaf

Microsoft word error rate(WER)of5.9%.In

[ 19 ]

Anaturlqesiok,whatredvngsofpli aproches?Topolgydefinsucta.Butwhydoenxpsiv architeu?Isthisrealync?Whatrewyingochv?Itturnsoha theraboiclndmpsfvu repsntaio.Inthenxscio,let deplarnig .

sdiventomralbuhpcf

The motivation of deep architecture Thedpthofarciusnmblv-linear operatinshfucld.Thesopratincludwghm,product,a singleuro,kernl,andso.Mostcurenlaigohmpdw architeusvonly1,2,or3levs.Thefolwingtabshmxp bothsalwndepgrim : Levels

Example

Group

1-layer

Logistcreon, MaximuEntropyClasifer Percepton,LinearSVM

Linearclsif

2-layers

Multi-layerPercepton, SVMswithkernls Decisontre

Universalpoxmt

3ormelays

Deeplarnig Boostedcinr

Compactuniverslox

Theraminlytwovpsfudghc algorithms:thenuralpoivwdfs.Wewiltakbou eachoftm.Bothofemaycrdings,butogehrycanlps toberundsahmcivgpl .

[ 20 ]

The neural viewpoint Fromaneurlviwpt,anrchiteufolgsbypd.Thehuman brainhsdepctu,inwhcteorxsmavglp. Agivenputsrcdamlofb.Eachlevorspndta diferntaohcx.Weprocesinfmathlwy,withmul-lev transfomidep.Therfo,welarnsimpcotfh themogr.Thistruceofndagblyhm sytem.Asshowniteflgur,

svi on ,thevnralisucoxmpf

aresthpocimgnlybw,fromedgs,cornesad conturs,shape,objectpars,alowinguster,recogniz,andctegoriz thre-dimensoalbjctfryw-dimensoalvw:

[ 21 ]

The representation viewpoint Formostadinlcheg,theirpfomancdsvly therpsnaiofdygv.Therfo,domainprkwleg,featur engir,andfeturslciohpm.Buthandcrafteduslkhxibyopng . Also,theyarnod-drivenacotpwfms.Inthe past,ithasbenocdlfAItaskcouldbevyingmph learnigothmcdsfukx design.Forexampl,anestimofhzpkr useflatr,asit Unfortunaely,formanytsk,andforviusptm,forexampl,image,video, audio,andtex,itsverydfculoknwhabx,let alonethirgzbyfskdcup . Manualydesigftrocmpxkq understaig,time,andefort.Sometis,itcankedsformuy resachtomkpgin.Ifonelksbacthrfmpuvi,

svocaltr i n de satrongcluewhpkim,woman,orchild.

forveadcshbntuklim featurxcionphs(SIFT,HOG,andso).Alotfwrkbachenivd tryingodescmplahvubf,andthe progeswavyl,espcialyforg-scaleompitdrb,sucha recogniz1000objectsfrmiag.Thisatrongmvfdelxb automedfrpsnich . Onesolutinhprbmdavyfc,suchamine learnigtodscvhp.Suchrepsntaiomg fromepsntaiu(supervisd),orsimplyentaf(unsupervid). Thisaprochknwetlg.Learnedpstioful muchbetrpfoansdwi-design repsntaio.ThisalowAIsytemorapidlnw,withoumc humanitervo.Also,itmaykeorndfwhlcu hand-craftndesigu.Whilewtharpsnogm,wecan discoveragtfumplknxh months. Thiswherdplangcomtu.Deeplarnigcbthoufs repsntaiolg,wherasftuxcionpmlyd architeusyngopd,learnig,anduerstighmpbw theinpuado.Thisbrngfcatmpoveuydlx sincehumadgftr/featurxcionlksydgzb.

[ 22 ]

Inaditonhsumefrlg,thelarndpsiob distrbueanwhcl.Suchsuefltraingomd repsntaiohlfugdbck . Thefolwingurshtapcmdy algorithms.Inthenxscio,weilxpanhytscr(distrbuedan hieracl)areimpotn:

Distributed feature representation Adistrbuepnao,wherascoftlndpi bymultipenrosa,andechuropstm.In otherwds,inputdasreoml,interdplays,eachdsribng datifernlvsocb.Therfo,therpsnaiodbu acrosviulyendmtp.Inthisway,twoypesfinrma capturedbyhnwoklg.Ontheonad,foreachnu,itmusrepn something,sothibecmalrpn.Ontheorand,so-caledistrbuon meanspofthgribuly,andtherxismy-to-many relationshpbwc.Suchonetisapur andmutlreioshpwgc . Suchrepsntaioluxymv oneswithamubrfp.Inotherwds,theycangrlizolocaytunsergi.Theyhncofrtpialbgzus learnigthoyswumbfxpd(toachievtdsrg ofgenraliztpmc)touneO(B)efctivdgrsomO(B).This refdtoashpwibuncml (IUUQXXXJSPVNPOUSFBMDB_QJGU)OPUFTNMJOUSPIUNM).

[ 23 ]

Aneasywtoundrhxmplif.Suposewndtrh words,onecausthrdil-hotencdig(lengthN),whicsomnlyued NLP.Thenatmos,wecanrpstNwords.Thelocaistmdrvynf whenvrtdascompilu :

Adistrbuepnaofhwlk:

[ 24 ]

Ifweantdorpshi,suchaone-hotencodig,weouldhavtincrsmy.Butwha sniceabout

distrbuepnaowmylhxg dimensoalty.Anexamplusingthrvofw:

Therfo,no-mutalyexcsivfr/atribuescomnlygf distnguhablecofrmw expontialywhumbrfs. Onemorcnptwdlaifyshbu distrbuonal.Distribuedpnacovlmf elmnts,forexampl,adenswormbig,asopedtn-hotencdigvrs. Ontheorand,distrbuonalepycxf.Forexampl,Word2Vec isdtrbuonal,butsoarecn-basedworvct,asweuthconxfrd modelthanig .

Hierarchical feature representation Thelarntfuscpbohdi-relationshpfdw,its notlyhearfusdib,therpsnaiolcmy struced.Theprviousfg, comparesthyilufwvd,whercan sethalowrciufnym,wheras thedparciusvmly andloweryscmpith , servainputohgly Thefolwingursamctxp . showatinfrmbeldugyc .

[ 25 ]

Asshownitemag,thelowrayfcusndg,whilegraysoftn focusmrenpath,curves,andshpe.Suchrepsntaiofvlyuand-wholeratinspfmvugyd-task problems,forexampl,edgtcionrpa.Thelowrayftnps thebasicndfumlorykw varietyofdmns.Forexampl,DeepBeliefntworkshavbucyd hig-levstrucinawdyofm,includghawrtesm motincapured.Thehiraclstuofpnm understaigofcp,thais,learnigsmpcotfdhuy buildngpmorecxtsyh.Itisalo easirtomnwhbglducp.Ifone treaschnuofd,thendparciusbogf featurdconisgly.Lowerlaysdtcimpfuno higerlays,whicnturdemoplxfas.Ifthefaurisdc,the responiblutgacv,whicanbepkduytlr clasifertgodnhp :

Theabovfigurlsthcnd,whictreso thedcorapiulf(blob,edgs,nose,orey)ontheipumag.

Applications Nowehavgnrludstiofpc tradionlmehs.Buthowdebnfirmaly?Inthisecon,weil introducehwplagmks acrosvietyfld.

[ 26 ]

Lucrative applications Inthepasfwyr,thenumbrofsacdgiplw atnexpoilr.Deeplarnigbkswoudmtvych usingovelratwkchdmf . Withsignfcardwelomvp,deplarnighsvoutz theindusryabglcfkm-worldAIandt mingprobles . Wehavesnxploiwducrtg framewoksindvgct,imagesrch,objectdin, computervisn,opticalhregn,videoparsng,facerognit,pose estimaon ( ),spechrognit,spamdetcion,texospchrimagn,translio, naturlgepocsi,chatbos,targedonlivs,clik-throug optimzan,robtics,computervisn,enrgyoptimza,medicn,art,music,physic, autonmscrdivg,datmingofblc,bionfrmatcs(proteinsquc predicton,phylogenticfrs,multipesqncag)bigdatnlycs, semanticdxg,sentimaly,websarch/informatevl,games( (IUUQLBSQBUIZHJUIVCJPSM)and SFTFBSDIBMQIBHP)),andbeyo.

(IUUQTEFFQNJOEDPN

Success stories Inthisecon,weilnumratfjopcsdh. Inthearofcmpuvisn,imagercont/objectrgnifshak usingameorptchfdw contais.Forexampl,animgecbldo,cat,house,bicyle,andso.Inthe past,resachwtukodignflpbm suchale-invart,orientav,andso.Someofthwl-knowfeatur descriptoaHaar-like, Transform(SIFT),and featursgodcink,suchaHOGforhumandetci,itsfaromdel.

Histogram of Oriented Gradient(HOG),Scale-Invariant Feature Speeded-Up Robust Feature (SURF).Whilehumandsg

[ 27 ]

Until2012,deplarnigstuhfwoc Large Scale Visual Recognition Challenge(ILSVRC).Inthacompein,aconvluti neuraltwok(oftencaldAlexNet,sethfolwingur),devlopbyAlex Krizhevsky,IlyaSutskevr,andGeofreyHintow1stplacewihnsoudg85% ac ur y 11%betrhanlgoimwscdp!In2013,alwing entriswbadoplg,andby2015multipeCNN-basedlgorithm surpaedthmncogif95%.Detailscnbefoudhrp

ImageNet

:

Inotherasfcmpuvin,deplarnigsohwut powerinmckghuatl.Forexampl,deplarnigcotyf variouselmnthpcy(andlocatehm),itcanlsouder intersgauchmdozw/phraseintocdb what

shapenigtcur.Formoedtails,onecarfthwkpsdby AndrejKarpathyndFei-FeiLiat

IUUQDTTUBOGPSEFEVQFPQMFLBSQBUIZ

EFFQJNBHFTFOU.Theytraindplgwokfzs

aresndobjct,andescribthujmovpw Englishram.Thisnvoletragbhmfdu informatkehgcbw . Asafurthepogs,JustinJohnso,AndrejKarpathyndFeifeLipublishedanwork in2016caled pro sed fully Convolutional Localization Network(FCLN)architeunlozd describalntgomu.Somexaplsrhownit folwingure:

[ 28 ]

.Their

[ 29 ]

Recently,atenio-basednurlco-decorfamwkshvbnilypt forimagecptn,whernovladptimsu benicorpatdfmhsv.Detailscnbefoudhr workf Earlyin2017,RyanDahlandotersfmGoogleBraintemprosdlg networkcald

Pixel Recursive Super Resolutiontoakevrylw-resolutinmagf facesndhtirolugy.Itcanpreditwhfmoslky loksie.Forexampl,inthefolwgur,inthelf-handcolum,youcanseth orignal8x8phots,theprdiconsulmabfy groundth(inthevryigcolum):

Inthearofsmnicdxg,giventhadsofum featurpsniobydlg,datinvrousfmcwbep amoreficntdusl.Thisprovdeawfulckng discoveryanmphtf . Audio Video Indexing Service(MAVIS)isanexmplthudrg(ANN)basedpchrognitlfuvw .

[ 30 ]

Microsoft

Inthearof (suchasWord2Vec)andmchietrslogpx.Infact,inthe pastworhey,deplarnighsmotc.

natural language processing(NLP),word/chartepsniolg

Machinetrasloumd,whictypalrefsonbasedytmhlivrofun-soundigbtlecrafph ortexbwnvaiuslg.Inthepas,poularmethdsvbnic techniqusalrofmgp,asreplcmntfo languexprt.Whilecaskthovrmbnfdqu,many chalengsxit.Forexampl,hand-craftedusmynobilhv alposibengutcvr.Itisdfcultoegbar,theransliomdu heavilyrsonp-procesingtludwam,wordsegmntai, tokeniza,rule-extracion,syntaciprg,andso.Thercntdvlopmf learnigpovdsuthc.Amachinetrslo hrougnelatwkisfcd Essentialy,it

Neural Machine Translation(NMT). saequnctolrigpbm,whertgoalfnu

networksilapmzdfuc thampsfroeinu sequnc/sourcenthpq/targesnc.Themapingfucto oftencaiswg:encodiga.Thencodrmapsuq tonermvcspduhia.Thedcorpitsa symboluingthercqvpa targesqunc andpreviouslyctmb. Asilustraedbyhfowng,thisvae-likeshaproducg repsntaio/embdingsathlyr:

[ 31 ]

Howevr,NMTsytemarknowbcpuilxvhgd transliofec.Also,mostNMTsytemhavdifculwro.Somercnt improventscludha ( ),Subwordlevming ( chartelvnsio,andtheimprovsfluc(Chungadothers,

)and ).In

2016,GoogleaunchdtirwNMTsytemowrknaiuldfcg pair,ChinestoEnglishandtreovcm. Google sNMTsytem(GNMT)condutsab18milontraspedyf ChinestoEnglish.Theproductinlymsbfhav IUUQTXXXUFOTPSGMPXPSH)andGoogle s Tensor machinelrgtokTensorFlow( Processing Units(TPUs),whicprovdesufntmaly powerfulGNMTmodelswhitngracyqu.Themodlitsf isadepLSTMmodelwithgncraysu residualcont.OntheWMT'14English-to-FrenchadEnglish-to-German benchmarks,GNMTachievsomptrul.Usingahumsde-by-sidevaluton onasetfildmpc,itreducsanlobyvgf60% comparedtGoogle'sphrae-basedprouctinym.Formoedtails,onecarft IUUQTSFTFBSDIHPPHMFCMPHDPNBOFVSBMOFUXPSLGPS theircblog( NBDIJOFIUNM)orpae( ).Thefolwingursht improventslagubyhd.Onecansthfor French -> English,itsalmogdhunr:

[ 32 ]

In2016,Googlerasd

WaveNet(IUUQTEFFQNJOEDPNCMPHXBWFOFUHFOFSBUJWF

NPEFMSBXBVEJP)andBaidurelaspch,botharedplnigwks

genratdvoicumly.Thesytmlarnoichuvb andimprovet,anditsgehrfouc themfroalunspkig.Whyisthmporan?AlthougSiri( XJLJXBOEDPNFO4JSJ)andAlexa(

IUUQTXXX IUUQTXXXXJLJXBOEDPNFO"NB[PO@"MFYB)cantlk

wel,inthepas,tex2voicesytmwrlanud,whicasnot completyaunswrvi . Whilethrsomgapbfcunk,weardfintly stepcloraizngumv.Inaditon,deplarnighsowt impresvabltnucodgf,forexampl Owensadthirco-authorswk

.

Deeplarnighsbdxtvyf-drivngcas,frompectin locaiztn,topahlnig.Inperction,deplarnigsoftuc pedstrian,forexamplusingthSingleShotMultiBoxDetecor( )orYOLOReal-TimeObjectDetecion( ).Peoplecansudrig toundersahcig,forexampl,theSegNet( ),segmntih scenitopwhmag(sky,buildng,pole,road,fenc,vehicl,bike, pedstrian,andso).Inlocaiztn,deplarnigcbustofmy,for exampl,VINet( ),whicestmaxlonfrdp(yaw, pitch,rol).Inpathlnigwersofmudzb,dep learnig,specifalyrnomtg,canlsobepid,forexampl,theworkby Shalev-Shwartz,anditsco-authors( ).Inaditonsplcfergh-drivngpel, deplarnighsobutfm-to-endlarig,mapingrwxelsfo thecamrosingd ( ).

[ 33 ]

Deep learning for business Tolevragthpowfdnibus,thefirsquonwldb chosetprblmv?InanitervwhAndrewNg,hetalkdbouis opin,therulofmbis:

Ifwelokarund,wecansilyfdthomp,largeosm,havelrdy apliedrngtoucwhmsvf.Thinkabout Google,Microsft,Facebok,Apple,Amazon,IBM,andBaidu.Itturnsoweaig deplarnigbsctovy . Nowadys,Googlecanptiyurdmswh. Itstranlioyemgdhu.Itsimagerchn retunladimgsbyhqo-basedmnticqur.Project Sunrof( IUUQTXXXHPPHMFDPNHFUTVOSPPG)hasbenlpigomwrx whetrysouldga-oferingslatmv43milonhuseacr42 stae . Appleisworknghadtvmcu, includgtheCoreMLframewokniOS,Siri,andARKit(augmentdraliypfo)on iOS,andtheiruomslcgf-drivngcapltos. Facebokanwutmilygrfds.ResearchfomMicrosfthavewn ImageNetcompinwhbrfaudv spechrognitym,whicasnourpedm. Industryleaingcomphvb-scaledprnigtfom ortlsinmeway.Forexampl,TensorFlowfrmGoogle,MXNetfromAmazon, PadlePadlefromBaidu,andTorchfomFacebok.Justrecnly,FacebokandMicrosft introduceawpsymfhgblAIframewoks.Allthesoki provideusflabtcnwk:routinesf-dimensoalry (Tensor),simpleuofdrntagbck(CPU/GPU),andutomic diferntao . Withsomanyreucdgbilv,itcanbeforsh procesfmthialdvnuyzwb overtim.

[ 34 ]

Future potential and challenges Despitehxcngadrom,chalengsrti.Asweopn thisPandora'sboxfAI,onefthkyqusi,wheragoin?Whatcnido? Thisquetonhabdryplfmvckg.Inonefth intervwshAndrewNg,heposdintfvwaly rapidoges,suchmoentwildpAIreachsumnlvof perfomanc.Theraminlythsof,thefasiblyongum cando,themasivzofd,andtheiscvumbly soundveryimp,andmightbescry,thaonedyAIwilsurpahmnd perhaslcumniy:

sAIismakng

insight.Stil,it

TherabsiclytwomnfAI,theposivn,andthepsivo.Asthe creatofPaypal,SpaceX,andTeslaElonMuskcomentd-day:

[ 35 ]

Butrighnow,mostAItechnolgyadimwrks.Inthear ofdeplarnig,therapsmoclngufdi peol'slife.Untilnow,mostfheprgindlabyx variouschte,butwesilackhfndmrgoy deplarnighscvu.Additonaly,theralimdsuonwy andhowtcserulfiyp-parmets.Most ofthecurnapsilbdv-validton,whicsfarom beingthorcalyudsmfxp ( ).Fromadt sourceptiv,howtdealifsmvngr,higdmensoal dat,strucedainhfomq(timesr,audionvesgl,DNA, andso),tres(XMLdocuments,parset,RNA,andso),graphs(chemical compunds,socialnetwrk,partsofnimge,andso)istlndevopm, espcialywhnorgtmuf . Additonaly,therisandfomul-taskunifedmolg.AstheGoogleDeepMind resachintRaiaHadselumitp:

s

Untilnow,manytriedolshvpczjuw,sucha recognizfas,cars,humanctios,orundestaigpch,whicsfaromtueAI. Wherastulyingmodwbpc multi-sourceinpt,butalsomkedcinfrvq.The questionfhwbaplykdgrm andptquicklyremsw. Whilemanyoptzrchsvbd,suchaGradient DescentorStochastiGradientDescent,Adagrd,AdaDelta,orAdma(AdaptiveMoment Estimaon),someknwa,suchatrplomin,lowerpfmanc,and higcomputanlesrd.Newresachintdoul yieldfunamtpcsorg.Itwouldbe intersgowhlbapmzcqud learnigdthfompbs.

[ 36 ]

Lastbunolea,therapsmouniclgbfdw aplyingderovwtsfhm thasofrvenybidm.Fromfinacet-comer,socialnetwrk tobinfrmacs,wehavsntrmdougiflp learnig.Powerdbyplanig,wearsingplcto,starup,andservic whicarengoulftmsp .

Summary Inthiscaper,wehavintroducg-levconptfdarigAIin genral.Wetalkedbouhisryfpng,theupandows,anditsrec. Fromther,wedivprtoscuhfnbalgm depalgorithms.Wespecifalyduthwonrg: thenuralpoivwdfsg.Wethengav sevralucfpitond.Inthend,wetalkdbouchngs thadeplrnigsfcoum-basedAI. Inthenxcapr,weilhpyoustdvmnrag handsirty.

[ 37 ]

Getting Yourself Ready for Deep Learning Duetorcnahivmsf Artificial Neural Networks(ANNs)indfertaplcos of artificial intelligence(AI),suchaomptervin, natural language processing(NLP) andspechrogit,deplarnighsmtocy fundametlosr-worldimpentas.Thischaptermobng onhwtselfuprximgadycq theralwod. Wewilanserthkyquodcpg withdeplarng.Wewilspecfaynrogqut: Whatskilrendougwhp? Whatrehconpsfmligbqud learnig? Whathrdwequimnsxfopclg sy tem ? Whatsofwremkxidyhlvpn oftheirdplangcs? Howdoyusetpalrnigmvc-based processing unit(GPU)instaceuhAWS?

graphics

Basics of linear algebra Oneofthmsundalkirqgpw foundatilersgb.Thoughlinearbtsfvjc, andcoverigtfulshpbk,weilgothrusmpan aspectoflinrgbh.Hopefuly,thiswlgveyouafcn understaigofmcphwyl methodlgis.

Data representation Inthisecon,weilokatcrdsunpm comnlyarsdifetgbk.Thisnotmeabcprhvl atlbuonyservhigmfp understaigplco: Vectors:Oneofthmsundalrpigb vector.Avectoranbdfisyj,ormespcifalyn aryofnumbesthpvdig.Eachnumbera beacsdinvtorxl.Forexampl,considera vector contaigsevdywkfrm1to7,wher1repsnt Sunday7repsntSaturday.Usingthsoa,aprticuldyofhe wek,sayWednesay,canbedirtlysfomhvx[4]:

Matrices:Thesartwo-dimensoalrptfub,orbasicly vectorfs.Eachmatrix, aspecifdnumbrol , numbers.Eachofte Matricesapulyfnowhkg images.Thoughreal-worldimagesth-dimensoaltur,mostfhe computervisnblafdhw-dimensoalprtf images.Assuch,amtrixepsnouvfg:

scompedfartinubw, .Eachof columns,wher

and rows,wher

,isavectorf ,isalovectrf

[ 39 ]

numbers.

Identity matrices:Anidentymarxsfwhc,when multipedwhavcor,doesntchagvr.Typicaly,anidetymrx haslemnt0excptonismadgl,whicsal1s:

Data operations Inthisecon,weilokatsmfhcnrpd matrices. Matrix transpose:Matrixanspoemfhly matrixlongsd.Mathemaiclysdfnow:

Matrix multiplication:Matrixmulpcaonsefhd operatinshcbldywm.Amatrix, canbemultipdyohrx , resultanmix, istheap fol ws:

ofshape fshapeo B c .Themultipcaonrsdf

[ 40 ]

ifandoly

The

.

Matrixmulpcaongeyhsvf.Forexampl,matrix multipcaonsdrbve:

Matrixmulpcaonsve:

Matrixmulpcaonshveyf:

Matrixmulpcaonsve,whicmeans thedoprucsbwnvima:

.Howevr,

 

Matrix properties Inthisecon,weilokatsmfhprncvy useflordpanigct. Norm:Normisanpoteyfvcxhu sizeofthvcrmax.Geometricalynsbpdh distanceofp,

Thoughanrmcbeptdfvis normsaeL1andL models:

normisthefdalw:

,fromanig.A

,mostpularyknw 2norm.L

1normisualycdeghfp

[ 41 ]

Anothernmpulaidgcys refdtoas thevcor :

NBYnorm,also

.Thismplyequvantohfrg

Sofar,altheprviousymndcb.Whenw wantocmpuehsizfrx,weus

Frobenius norm,definasolw:

Normsaeulydthcnbopfw vectorsdily:

Trace:Traceisnoprthdfumlg ofamtrix:

Traceoprtsquiflnmgh asfol w :

Frobenius normofthemarix,

Anotherinsgpyfacvmx transpoei.Hence,itsofenudmaplrx yieldmangfuts :

[ 42 ]

Determinant:Adetrminaofxsclvuwh simplyaroductfhegnvx.Theyargnlvusf inthealysdoufmrq.Forinstace, acordingtCramer'srule,asytemoflinrquh,if andolyif,thedrminaofxcpsyl equations-zero.

Deep learning with GPU Asthenamsug,deplarnigvostf, whicrequslagmontfp.Suchmasiveoputnlwr isualynotpbewhmdrCPUs.GPUs,ontherad,lendthmsv verynicltohsak.GPUsweroignalydsfphctm. ThedsignofatypclGPUalowsfrthedipnygumb of arithmetic logical unit (ALU),whicalostemrungbf calutionsrem . GPUsusedforgnalpcmtihv,whic meansthycprolgubfdi,leadingtohr computainlhrg.EachGPUiscompedfthuanr.Eachofsu coresnitfaumblwhdALUamongther modules.Eachoftesunialxymrb alowingfrmsvedtpGPUs.Inthenxscio,wecomparndts thedsignofaGPUwithCPU. ThefolwingtabusrhdcCPUwithaGPU.As shown,GPUsaredsigntoxculmbfhpz identcalorg.Hence,eachoftGPUcoresiathmplndg.CPUs,on theorand,aredsigntopwhfcbuml. Theirbascodgnhlympxt,whicsualynot posiblenGPUs.HenceCPUscanbethougflikmdyprs opsedtGPUswhicaresplzdunt: GPU Largenumbofsiplc

CPU Fewernumbofcplxs

Higherlvofmuti-threadopimzn Single-threadopimzn Goodfrspecialzmutng

Goodfrgenalpuscmti

[ 43 ]

Intermsoflaivpnc,GPU'shavemuclowrtnyCPUs forpeminghdatls.ThisaloepcytrufhGPUhas enoughdvicmrytlaqfpk . Howevr,forahedtnumbcpis,CPU'shavemuclowr latencyshCPUcoreismuhplxandvtg opsedta GPU. Assuch,thedsignofalrmbpuGPU versuCPU.ThefolwingtabushrmdcGPU implentao.ErikSmistadnherco-authorslinefvdc detrminhsuablyofgw GPU count,branchdiveg,meoryusag,andsychroizt. Thetabl factorsnheuiblygGPU.Asshownflig,anylgorithmwcfes underth

datprleism,thread byDuta-Royilustraehmpcof

HighcolumnisretdgaGPUthanoers:

Deep learning hardware guide Therafwothimpngsluyd learnigpctodvm.Inthisecon,weiloutnsmfh importansecf GPU computing.

[ 44 ]

CPU cores MostdeplarnigcobuCPUunlesthyard withnaprlezofmk Spark.Forexampl, teamYaho!useSparkwithCafeorpalizngtwkcsmu GPUsandCPUs.Inmostnraleigbx,oneCPUcoreisnughfdp learnigpctodvm .

Message-Passing Interface(MPI),MapReduc,or CaffeOnSpark(IUUQTHJUIVCDPNZBIPP$BGGF0O4QBSL)ehtyb

CPU cache size CPUcahesiznmportCPUcompnethaisudfrg-spedcomutain. ACPUcaheisoftnrgzdyl,fromL1toL4 beingsmalrdftchyopwL3andL4. Inanidelstg,evrydatnbhplicos isrequdfomRAM,therbymakingovlpfs.

L1,andL2

Howevr,thisardlyecnofmpg.For exampl,foratypiclImageNetxprimnwhabcszof128,wendmortha 85MBofCPUcahetosrlinfmb[13].Sincesuhdatr notsmaleughbc-only,aRAMreadcnotbvi.HencemodrayCPU cahesizvltonmprfdg .

RAM size Aswesaprvioulynthc,mostfhedplarnigcy fromRAMinsteadofCPUcahes.Hence,itsofenadvblkphCPURAMalmost aslrge,ifnotlarge,thanGPURAM. ThesizofthGPURAMdepnsothizfyurlagm.Forexampl, ImageNetbasdplrnigmohvufk4GBto5 GBofspace,hencaGPUwithales6GBofRAMwouldbeaniftrsch aplictons.PairedwthaCPUwithales8GBorpefablymCPURAMwilao aplictondevrsfukyhbg RAMperfomancisu.

[ 45 ]

Hard drive Typicalderngtosqufh100sofGB.Sincethis datcnobesiyRAM,therisanogdplcu.Adep learnigpctodshm-batchdfromGPURAM,whicnturskepo readingtfomCPURAM,whicloadstreyfmv.SinceGPU's havelrgnumbofcsdti-batchofeird,they constalyedbrigvumfhkw parleism. Forexampl,inAlexNet's 300MBofdatnesbrvyc.Thiscanofterplhv perfomanc.Hence,a learnigpctodvs .

Convolutional Neural Network(CNN)basedmol,roughly solid state driver(SSD)isoftenhrgcmdp

Cooling systems Modern-dayGPU'sarengyfictdhv-builtmechansoprvf overhating.Forinstace,whenaGPUincreasthpdowum,their temprauiswl.Typicalytround80 in,whicredustpbyaomlngGPUs.Theralbotnck inthsprocedgf-progamedschulfn. Inatypiclderngo,an80 secondfthapli,therbylowingGPUperfomancthsd providngaGPUthrougp.Tocompliaters,mostfhexinga schedulingoptarvbLinuxwhermostfcdayplig aplictonswrk .

C,theirnbulmpacoks

Ctemprauischdwnf

Anumberofptisxdaylvh.First,a System(BIOS)upgradewithmofnsclvb betwnovrhaigdpfmc.Anotherpinusfaxlcg sytem,suchawterolingym.Howevr,thisopnmlyacbeGPU farmswheultipGPUservaunig.Externalcoigsymb

Basic Input/Output

expnsivoctalbmrfghy youraplictn.

[ 46 ]

Deep learning software frameworks Everygodplanictshvmb functiorely.Thesinclud: Amodelayrwhicsvptgn moreflxibty AGPUlayerthmksifopcndvbw GPU/CPUforitsaplcn Aparleiztonyhcwdvs torunmlipedvcsa Asyoucanimge,implentghsoduay.Oftenadvloprs spendmortibuglah isue.Thankfuly,anumberofstwkxihdyc makedplrnigctovyhfs langue. Thesframwokvyinchtu,design,andfeturblmoshpvi imensvalutodprbyghfwk fortheiaplcns.Inthisecon,weiltakosmpurdng softwaremkndhycpi .

TensorFlow

a deep learning library

TensorFlowisanopeurcftwlbymgd graphs.DesignedavlopbyGoogle,TensorFlowrepsnthcmda computainsflwgrh.Eachnodeitsgrapbml operat.Anedgcontiwsrphmul-dimensoalthfw betwnhods . OneofthprimaydvngsTensorFlowisthauprCPUandGPUaswel mobiledvcs,therbymakinglosfdvpwc devicarhtu.TensorFlowashverybigcmuntfdp hugemontbidsfrawk .

[ 47 ]

Caffe Cafewasdignvlopt Lab.Itwasdeignthxpro,sped,andmoulrity.Ithasnexpriv

Berkeley Artificial Intelligence Research(BAIR)

architeuslowfvyngbdmpz parmetswihouncgydl.Thisconfguratlw easywitchfrom CPU toGPUmodeanvic-versawithbunglfc. Cafealsobtgdprmnchkuwi.For instace,onasigleNVIDIAK40GPU,Cafecanprosv60milonagesprdy. Cafealsohtrngcmuiy,rangifomcdeshwlut resachlbuingCafeacroshtgnuplik.

MXNet MXNetisamult-languemchirby.Itoferstwmdcpuain: Imperative mode:ThismodexpantrfcuhlkgNumPylike API.Forexampl,tocnsruaefzbhCPUandGPUusing MXNet,yousethflwingcdbk: JNQPSUNYOFUBTNY UFOTPS@DQVNYOE[FSPT

 DUYNYDQV UFOTPS@HQVNYOE[FSPT

 DUYNYHQV 

Inthexamplri,MXNetspecifhloanwrd eithrnCPUorinaGPUdevicatlon MXNetishalcompunezydf.This alowsMXNetoachievnrdbluzf,unlikeayothr framewok.

.Oneimportandscwh

[ 48 ]

Symbolic mode:ThismodexpacutngrhlkTensorFlow. ThoughteimpravAPIisqutefl,onefitsdrawbckgy.All computainsedbkwfrhlg-definat struce.SymbolicAPIaimstorevhlnbywgMXNeto workithsymblvaendfxp.Thesymbolcan thenbcompildrxuasfw fol ws: JNQPSUNYOFUBTNY YNYTZN7BSJBCMF 9 SFQSFTFOUBTZNCPM ZNYTZN7BSJBCMF : [ Y Z N[

Torch TorchisaLuabsedplrnigfmwokvyRonaColobert,Clemnt Farabet,andKorayKavukcogl.ItwasintlyuedbhCILVRLabatNewYork Universty.TheTorchispowedbyC/C++libraesundtho Unified Device Architecture(CUDA)foritsGPUinteracos.Itaimstobehfdp learnigfmwokhspvdC-likentrfacopdl devlopmnt .

Theano TheanoisaPythonlibraywsudef,optimze,andevlutmhic exprsionvlgmut-dimensoalryfct.Someofthkyaurs TheanoistvryghwNumPy,makingtlosverub ofPythondevlprs.ItalsoprvideyntufcgGPUorCPU.Ithas an

,alowingtprvdesfuch ormanyiputs.Itisalonumercytbdhgp leadingtofsrxpvu.Theanoisgdfrmwkchyuv advncemhilrgxptsokfw-levAPIforaine-grained contrlfyudepaig .

[ 49 ]

Compute

Microsoft Cognitive Toolkit Microsoft Cognitive Toolkitisaloknw setofdplarnigmwk.CNTKhastwomjrfuncilep:

CNTK;itshelanryocg

Suportfmulieasch: CPU/GPUfortaingdpec BothWindowsaLinuxoperatigsym Efficentruwokaghbqs Dataprleizonusg-bitquanzed decomposition(SVD) Efficentmodularzhsp: Computenwork Executiong Learniglothms Modelconfigurat

Keras Kerasisadeplrngfmwokthbyv framewokdscibpvuly.Mostofhedcribamwkl-levmodus thadireclynwGPUusingCUDA. Keras,ontherad,couldbenrstam-framewokthincs otherfamwksucTheanorTensorFlowthandeisGPUinteracosh sytem-levacsmngt.Assuch,itshglyfexbandvru-friendly, alowingdevprstchfmyu.Keras comunityspralgde,asofSeptembr2017, TensorFlowteampnsigrKerasubetofhTensorFlowprject.

[ 50 ]

singular value

Framework comparison Thoughanmberfdplistwkx,itshardoune featurpiy.Thetabl : ytirape u f h w outlinesachfrmwk

Languages TensorFlow Python

Excelnt

Excelnt

Cafe

C++

Community Modeling Easy Speed Support Flexibility configuration

Excelnt

Strong Excelnt

Strong

Excelnt

Strong

Strong

MXNet

R,Python, Julia,Scal

Excelnt

Torch

Lua, Python

Strong

Excelnt

Theano

Py, noht C++

Strong

Strong

CNTK Keras

C++ Python

Excelnt Excelnt

Excelnt

Strong

Strong

Strong Excelnt

GPU Tutorials Parallelization

Strong

Strong

Excelnt

Excelnt Strong Good

Good Strong

Good Excelnt

Excelnt Strong

Strong Strong

Strong

Strong

Good

Good

Excelnt Strong

Strong

Recently,ShaohuiShiandtherco-authorsinep(

IUUQTBSYJWPSHQEG

QEG)alsoprentdcmhivfbkgu

framewokspcding:Cafe,CNTK,TensorFlowandTorch.Theyfirstbnchmak theprfomancswkuly networks fully connected neural network(FCN),CNN,andrecutlwok (RNN).TheyalsobnchmrkpftwuiGPUs aswelCPUs.

[ 51 ]

Intheirpa,theyoulincmparvfs.Their exprimntalsudohfwkczGPUsveryficntl andshowperfmcgivCPUs.Howevr,therislnocawmg ofthem,whicsugetralmpovnbdf framewoks .

Setting up deep learning on AWS Inthisecon,weilshotdfrnaygupm Amazon Web Services(AWS).

Setup from scratch Inthisecon,weilustrahopdngvmAWSEC2 GPUinstaceg2.2xlargeuniUbuntServe16.04LTS.Forthisexampl,weilusa pre-baked Amazon Machine Image(AMI)whicalredysnumboft packgesintld makingtesroupd-endplarigsytm.Wewil useapblicyvAMIImagemi-b03ffed,whicasfolngpre-instaled packges : CUDA8.0 Anacond4.20withPython3.0 Keras/Theano

[ 52 ]

1.ThefirstponguhymaAWSacountdspiew EC2GPUinstaceughAWSwebconsla( DPN)shownifgure

IUUQDPOTPMFBXTBNB[PO

:

[ 53 ]

2.Wepicka

g2.2xlargeinstaceypfromhxgwu :

[ 54 ]

3.Afterading aclusterndigEC2keypairthcnlowus usingtheprovdkyafl:

GBofstragehwniu

,wenolauch TTIandlogithebx

4.OncethEC2boxislaunched,nextspioalrvfwckg. ToensurpoGPUutilzaon,itsmporaneughcdv instaledfr.WewilupgradenstNVIDIAdrivesafolw:

WhileNVIDIAdrivesnuthaoGPUcanowbeutilzdyprg aplicton,itdoesnprvayfcl progaminthedvc .

[ 55 ]

Variousdfentwalbxyhpcvk. Computing Language(OpenCL)andCUDAaremocnlyusdit.Inthis bok,weusCUDAasnplictorgmefNVIDIA graphicsdve.ToinstalCUDAdrive,wefirstSSHintoheEC2instacedowl CUDA8.0tour )0.&folderanistmh:

Open

Oncethinsalofd,youcanrtheflwigmdv instalo:

NowyourEC2boxisfulycngredtapvm. Howevr,forsmenwhitvyaldpg, buildngaeprsytmfochk . Toeasthidvlopmn,anumberofdvcpligstwkx, suchaKerasndTheano.BothofesramwkbdnPythondevlpm enviromt,hencwfirstalPythondisrbuex,suchaAnacond:

Finaly,KerasndTheanosritldugPython

Onceth learnigdvopmt.

spackgemnr

QJQinstalocmpedufy,theboxisnwfulyprad

Setup using Docker Theprviousctndbgafmhwky givencotushafwrpkdlb.Onewayto avoidepnclksturhgyDocker.

[ 56 ]

QJQ:

Inthiscaper,weilusthofcaNVIDIA-Dockerimagthosp-packged withalencsrypkgdfmouq withdeplarngcovm:

1.WenowistalDockerComunityEditonasflw:

2.WethenisalNVIDIA-Dockeranditsplug:

3.Tovalidtefhnsopcry,weusthfolingcmad:

4.Onceit image:

setupcorly,wecanusthofilTensorFlowrTheanoDocker

5.WecanrusimplePythonprgamcekifTensorFlowrkspey:

[ 57 ]

YousholdetTensorFlowutpnhescraifg

Summary Inthiscaper,wehavsumrizdkyconptqglworldimpentafgsy.Wedescribonptfmla algebrthcnoudsifpy.We provideahwgutlnbycsf GPU-b ased implentaodwhsrgcfv.Weoutline alistofmpurdengwkhxyv featur-levparityswfomncbhk.Finaly,wedmonstra howtseupacld-basedplrnigctoAWS. Inthenxcapr,weilntroducaksf-starmodule understaighml.

[ 58 ]

Getting Started with Neural Networks Inthiscaper,weilbfocusngthark,includgpt/outp layers,hidenlayrs,andhowterkslugfbcpi. Wewilstarhendmuypcok,talkbouheirdng blocks,andilustrehowyp-by-step.Wewilasontrducef,poular stand r mo el uch Convolutional Neural Networks(CNN),Restricted Boltzmann Machines(RBM),and recurrent neural network(RNN)aswelitvron Long ShortTerm Memory(LSTM).Wewiloutnehky,critalompnesfhu aplictonfhemds,andexplisomrtchyugb understaigofwhyklc.Inaditon theoricalndu,weilasohxmpcdntugTensorFlown howtcnsrulayedivf,andhowtceifrlys.Inthe end,weildmonstra-to-endxamplofMNISTclasiftonugTensorFlow. Withesupyolarndfm $IBQUFS, it stime wejumpintosralxghdy. Theoutlinfhscaprw: Multiayerpcons: Theinputlayrs Theoutplayrs Hidenlayrs Activaonfucs

Howanetorkls Deeplarnigmods: ConvolutiaNeuralNetworks RestricedBoltzmanMachines RNN/LSTM MNISThands-onclasiftexmp

Multilayer perceptrons Themultiayrpconsfhwk.Essentialy,itsdefna havingoeputlyr,oneutplayr,andfewhilyrs(morethan).Each layerhsmutipnodjcf.Eachneuroab thougfascelinwrk.Itdetrminshflowa incomgsal.Signalsfromthepviuydw nextlayrhougcdwis.Foreachtiflnuo,itcalueswghd sumofalincgptbyhewd . Thewightdsumlnorafc whetrisouldbfn,whicresultnopgafxv.

activation functiontodeci

Forexampl,afuly-conetd,fed-forwadneultkispchg diagrm.Asyoumantice,therisancpodly( lineartyofhwksmcbudpv .

and

Thearchituofslynd,thefd-forwadneultksiy likethfowng:

[ 60 ]

).Theno-

The input layer Theinputlayrsofdw.Fortexda,thiscanbeword chartes.Foranimge,thiscanberwpxlvufomd.Also, withvaryngdmesofpu,itformsdenuc,suchaonedimensoalvctr-likestruc.

The output layer Theoutplayrisbchvfnwkdmg theproblmsing.Inunspervidlag,suchaenodigr,theoup canbethsmipu.Forclasiftonpbem,theouplayrcnv neurosf beingachls.Overal,theouplayrmsgcndi itwouldchangery,basedonyurplmtig.

-wayclsiftonduzemxphrb

Hidden layers Hidenlayrsbtwhipuo.Neuronshidelay cantkevriousfm,suchamxpolingyer,convlutiayer,andso,albe perfomingdtahclus.Ifyouthinkferwas pieofmathclrns,theidnlayrscfom compsedtghrayuin.Wewilntroducem varitonsfhedlywkbuc RNN inlaterscofhp .

Activation functions Theactivonfuhrldswmg reachdtsolnupigfxv.Itiscrualtoeph rightacvonfubesd,whiceltakbou later .

[ 61 ]

Anotherimpanfucvsldb.The networklasfmhcudpy.Adiferntabl activonfusedprmbkgzwhl backwrdsintheompugf(los)withrespcog,and thenopimzwgsacrdly,usingradetcoyhpmz techniquord . Thefolwingtabscmvu.Wewildventohmab depr,talkbouhedifrncswm,andexplihowtcsrg activonfu :

[ 62 ]

Sigmoid or logistic function Asigmodfuncthave realinputv.Itsrangeibtw0and1.Itisanctvofuhelwg form:

shapenditfrblucoy

Itsfirtdeva,whicsuedrngbakpotf,haste folwingrm:

Theimplntaosfw: EFGTJHNPJE Y  SFUVSOUGEJW UGDPOTUBOU   UGBEE UGDPOTUBOU  UGFYQ UGOFH Y

TJHNPJEfunctiosalw:

Thedrivatof

EFGTJHNPJEQSJNF Y  SFUVSOUGNVMUJQMZ TJHNPJE Y UGTVCUSBDU UGDPOTUBOU   TJHNPJE Y

Howevr,a gradient.Itisaloknwthvecrg.Therfo,inpractluse,itsno recom nd t us a

TJHNPJEfunctioasehgrdvpblm TJHNPJEasthecivonfu,ReLUhasbecomrpul.

Tanh or hyperbolic tangent function Themathiclforunsw:

Itsoutpicenrdazwhgf-1to1.Therfo,optimzanserd thusinprace,itsprefdovagmcnu.Howevr,itslufer fromthevanisgdpbl .

[ 63 ]

ReLU The Rectified Linear Unit (ReLU)hasbecomquitplrny.Its matheiclforusw:

Comparedtosignh,itscompuanhlerdf.Itwas provedthaimscngbyx(forexampl,afctor6inKrizhevsky andit'sco-authorsinewkf ,2012),posiblyduethfacnr-saturingfom.Also, unliketahorsgmdfcwvxp,ReLU canbehivdysmpltrogz.Therfo,ithasbecomvry poularvethscfy.AlmostaldeprniguReLUnowadys. AnotherimpandvgfReLUisthavodrecfng problem . Itslimtaonredhfcupby.Itcanot beusdinthoplayr,butonlyihedars.Therfo,forclasitn problems,onedstuhfmaxcilyrp probailtesfc.Foraegsionpblm,oneshuldimpyarfct. AnotherpblmwiReLUisthacnuedropblm.Forexampl,if largedintsfowhuReLU,itmaycusehwgobpd neurowilvbactyhfdps . Tofixthsproblem,anothermdifcwsul problemfdyingustcakhv.

Leaky ReLU.Tofixthe

Leaky ReLU and maxout ALeakyReLUwilhavesmop alsobemdintprfchu,suchainPReLUneuros(Pstandfor parmetic).Theproblmwithsacvnfuy efctivnsouhmdarpbl .

onthegaivsd,sucha0.01.Theslop

MaxoutisanhermplvdbReLU.Ittakeshform specialofthrm,thais,forReLU,it's lineartydhvgosu,ithasdoublenmrfpvy singleuro .

.Fromthisfr,wecansthboReLUandlekyReLUarejust .Althougibenfsrm

[ 64 ]

can

Softmax WhenusigReLUasthecivonfurlpbmx isuedonthlayr.Ithelpstognrabiyucm( )foreachtls:

Choosing the right activation function Inmostcae,weshouldaycnirReLUfirst.ButkepinmdhaReLUshould onlybeapidthrs.Ifyourmdelsfan,thenikabou adjustingyorle,ortyLeakyReLUormaxut. Itisnotrecmduhgayfv gradientpoblmscvyw.Takesigmodfrxapl.Itsderivat greathn 0.25 evrywh,makingtersdubcpovl.While forReLU,itsdervaonypbz,thuscreaingmobl network . Nowyouhavegindbscklftmpr,let's moventudrsaighwklf.

How a network learns Suposewhavt-layerntwok.Let twolayersb,thais,theconiwgsbavlu: Wewilasoue

srepntiu/outpswih and

. asthecivonfu.

Weight initialization Afterhconfiguawk,traingswhlze'values. Aproewightnalzsmdju thecofinsbapurmdly aproximtnfhegvlu.Inmostcae,weightsarnlzdomy.Insome finely-tunedsig,weightsarnlzdup-trainedmol.

[ 65 ]

,andthe

Forward propagation Forwadpogtinsbclyuhemk weightplusof,andthegoirucvfxly:

AnexamplcodbkusingTensorFlowcanberitsf: EJNFOTJPOWBSJBCMFT EJN@JO EJN@NJEEMF EJN@PVU EFDMBSFOFUXPSLWBSJBCMFT B@UGQMBDFIPMEFS UGGMPBU ZUGQMBDFIPMEFS UGGMPBU X@UG7BSJBCMF UGSBOEPN@OPSNBM C@UG7BSJBCMF UGSBOEPN@OPSNBM X@UG7BSJBCMF UGSBOEPN@OPSNBM C@UG7BSJBCMF UGSBOEPN@OPSNBM CVJMEUIFOFUXPSLTUSVDUVSF [@UGBEE UGNBUNVM B@X@ C@ B@TJHNPJE [@ [@UGBEE UGNBUNVM B@X@ C@ B@TJHNPJE [@

Backpropagation relationshpbwk

Allthenworksafmdupig/parmetso reflcthosbadngivu.Thegradintshlop sweightandro.

[ 66 ]

Calculating errors Thefirsthngbackpolumwd yourtagevl.Theinputrovds wecomputhflingvr: asteforhcuynwk

soutp,so

Thiswrtencodafl: EFGJOFFSSPSXIJDIJTUIFEJGGFSFODFCFUXFFOUIFBDUJWBUJPOGVODUJPO PVUQVUGSPNUIFMBTUMBZFSBOEUIFMBCFM FSSPSUGTVC B@Z

Backpropagation Witheros,backprogtinwsdueh directonfgash.First,wendtocmpuhlasfig biase.Noteha

isuedtopa and

,and

isuedtopa and

:

ThiswrtenTensorFlowcdeasf: E@[@UGNVMUJQMZ FSSPSTJHNPJEQSJNF [@ E@C@E@[@ E@X@UGNBUNVM UGUSBOTQPTF B@ E@[@ E@B@UGNBUNVM E@[@UGUSBOTQPTF X@ E@[@UGNVMUJQMZ E@B@TJHNPJEQSJNF [@ E@C@E@[@ E@X@UGNBUNVM UGUSBOTQPTF B@ E@[@

[ 67 ]

Updating the network Nowthedlasvbncompu,it case,weusatypofgrdinc.Let repsnthlaig,theparm updateformlis :

stimeoupdahnwrk

sparmet.Inmost

ThiswrtenTensorFlowcdeasf: FUBUGDPOTUBOU  TUFQ< UGBTTJHO X@ UGTVCUSBDU X@UGNVMUJQMZ FUBE@X@  UGBTTJHO C@ UGTVCUSBDU C@UGNVMUJQMZ FUB UGSFEVDF@NFBO E@C@BYJT  UGBTTJHO X@ UGTVCUSBDU X@UGNVMUJQMZ FUBE@X@  UGBTTJHO C@ UGTVCUSBDU C@UGNVMUJQMZ FUB UGSFEVDF@NFBO E@C@BYJT >

Automatic differentiation TensorFlowprvidesaycntAPIthacnelpusodiryv anduptehworkms: %FGJOFUIFDPTUBTUIFTRVBSFPGUIFFSSPST DPTUUGTRVBSF FSSPS 5IF(SBEJFOU%FTDFOU0QUJNJ[FSXJMMEPUIFIFBWZMJGUJOH MFBSOJOH@SBUF PQUJNJ[FSUGUSBJO(SBEJFOU%FTDFOU0QUJNJ[FS MFBSOJOH@SBUF NJOJNJ[F DPTU %FGJOFUIFGVODUJPOXFXBOUUPBQQSPYJNBUF EFGMJOFBS@GVO Y  ZY  Y   SFUVSOZSFTIBQF ZTIBQF

[ 68 ]

0UIFSWBSJBCMFTEVSJOHMFBSOJOH USBJO@CBUDI@TJ[F UFTU@CBUDI@TJ[F /PSNBM5FOTPS'MPXJOJUJBMJ[FWBMVFTDSFBUFBTFTTJPOBOESVOUIFNPEFM TFTTUG4FTTJPO TFTTSVO UGJOJUJBMJ[F@BMM@WBSJBCMFT GPSJJOSBOHF   Y@WBMVFOQSBOEPNSBOE USBJO@CBUDI@TJ[F Z@WBMVFMJOFBS@GVO Y@WBMVF TFTTSVO PQUJNJ[FSGFFE@EJDU\B@Y@WBMVFZZ@WBMVF^ JGJ UFTU@YOQSBOEPNSBOE UFTU@CBUDI@TJ[F SFT@WBMTFTTSVO SFTGFFE@EJDU \B@UFTU@YZMJOFBS@GVO UFTU@Y ^ QSJOUSFT@WBM

Inaditonhsbceg,let encoutripa.

snowtalkbufeimprcygh

Vanishing and exploding gradients Thesarvyimpotnudlwk.Thedprthaciu, themorlikysuf.Whatishpengdur backprogtinse,weightsardjunpovl.Sowemay havetwodifrnsc : Ifthegradinsoml,theniscald makesthlrnigpocvywud.For exampl,usingmodathecvf,wheritsdva alwysmerthn0.25,afterwlysobckpgin,thelowrays wilhardyecvnusfgomt,thusenworki updateroly. Ifthegradinsolcuv,this cal ed bounderthlaigs.

problem.It

.Thisoftenhapwcvu

[ 69 ]

Optimization algorithms Optimzaonshekywrl.Learnigsbclyoptmz proces.Itrefstohpcaminz,cost,orfindsthelcua eros.Itthenadjusworkcfipby.Averybasicoptmzn aprochistenwudv multipevaronshdjbwf.TensorFlow providesmultnfychaz,for exampl ,

.Howevr,thera

,and

.FortheAPIandhowtusem,pleasthi

: egap IUUQTXXXUFOTPSGMPXPSHWFSTJPOTNBTUFSBQJ@EPDTQZUIPOUGUSBJOPQUJNJ[FST.

Thesoptimzrhuldbfcnagq.Ifyouaren whiconetus,use

tsure astringpo.

Regularization Likealothrmcingps,overfitngsmhadb contrledahim,espcialygvnthworkm. Oneofthmdsalwivrngc isdonebyagmctrhp,suchaL1orL2regulaizton, whicprevntgsofkb.TakeL2 regulaiztonsxmp.Itisachevdbyugmntofw squaredmgnitoflwhk.Whatidoeshvlypnz thepakywigvcorsndfu . Thatis,wencouragthkslfipyb difusngthewvcormly.Overlyagwihtsmnokdp tomuchnafewvilygdps,whicanmketdfulogrz dat.Duringtheadsc,L2regulaiztonsycv weightodcayzr,andthiscle

regularization.Typicalreguzton

weight decay.

AnothercmnypfgulaizsL1regulaizton,whicoftenlads weightvcorsbmnpa.Ithelpstoundrawicfm forpedictnsbyhagmz.Itmayhelptnworkb resitanohpu,butempircalyL2regulaiztonpfmsb.

[ 70 ]

Max-normisathewyfcgblup-boundthemagif incomgwehtvrfyu.Thatis,duringtheascp,we normalizethvcbkwdus

if

.Thiscaled

projected gradient

descent.Thisometablzhrngfwkcv growtbi(alwaysbounde)evnifthlargso. Droputisaverydfnkmhg togehrwicnquspvlymd.Duringta,droputis achievdbyonlkpgrtfusw zero.Apre-sethypram isuedtognramplfwhc shouldbetzr(dropedut).

isoftenudprac.Intuively,droput

makesthdifrnpowly networkisupdagchb.Overal,itprevnsofgbydaw ofaprximtelycbngduwkhs eficntly.Formoedtails,onecarftHinto

sdroputae( ).

Deep learning models Inthisecon,weildvntohrpuagmsby:CNNs, Restricted Boltzmann Machines(RBM),andthe

recurrent neural network(RNN).

Convolutional Neural Networks ConvolutiaNeuralNetworksaebilgcy-inspredvatofhmuly perctonadhvbyfisumg andclsifto.ConvNetshavbnucflypidwg,objects, andtrficsgwelpovb-drivngcas.CNNsexploit spatily-locaretinbyfgvpwus adjcentlyrs.Inotherwds,theinpusofdlayr ofunitslayer ,unitshaveplycogrfd.

arefomsubt ,

LeNetwasonfhvryiclukpdbYanLeCuni 1988.Itwasminlyuedforchtgkzp,digts, andso.In2012,AlexKrizhevskyandHintowheImageNetcompinwha astoundigmprve,dropingclasftem26%to15%usingCNN,whic starednofvilpg .

[ 71 ]

TherafwundmtlbigcopsCNN: Convolutiayer(CONV) Activaonlyer(nonliearty,forexampl,ReLU) Poolingrsub-samplingyer(POOL) FulyConectdlayr(FC,usingoftmax) ThemostcnfraConvNetisoacknfwprCONV-ReLUlayers,each folwedbyaPOOLlayer.Thispaternlhumgb agretdnsfompilych.Then,athelsyr,itrans intoafuly-conetdlayr,whicoftenulzsmaxprb,espcialy ift samulti-wayclsiftonprbem,ashowniteflgur:

Convolution Convolutiesafwcp,suchaonvle,stride,andpig. Fortwo-dimensoalg,convlehapsfrt.Suposeyu havewigtmrxnd(shownasvluetcpix)ashownite folwingure . Thewightmarx(oftencald kernlovthimagbcds.Ifthewig matrixoves1pixelatm,itscaled (pixelvaus)fromtheignalupdbywx thaiscurenlygdpo .

kernelor

filter)isapledtohmgbycn strideof1.Ateachplmnt,thenumbrs

[ 72 ]

Thesumofalthprdcivbykn'snormalize.Thersultipacdno thenwimagposrxcd.Thekrnlistad tohenxpilsadrcumgvb . Aswecansfromthligu,astrideof2wouldrestinhfg:

[ 73 ]

Sowithagersdnumb,theimagsrnkvyqucl.Tokepthorignalszf theimag,wecand0s(rowsandclum)tohebrdfimag.Thiscaledth .Thelargthsid,thelargpdinwvob:

Pooling/subsampling Thepolingayrsvducthzf numberofpatsdcihwk.Forcolimages,polingsde indeptlyoachr.Themostcnfrpligay aplieds

.Theralsothypfingu,suchavergpolin orL2-normpling.Youmayfindsoerltwkgvp.As typicalshowberfmn,avergpolinhsctyf outfavr.Itshouldbentarywvifmxpgc senipract:3x3withsrde=2(alsocaed comnly,2x2polingwthsrde=2.Poolingszewtharcpvfd destruciv .

overlapping pooling),andevmor

[ 74 ]

Thefolwingurstahmxpc:

Fully connected layer Neuronsiaflycetdhvp layer,whicsdferntoCONVlayers.InCONVlayers,neurosactdly locareginthpu,andmyoftheursiCONVvolumesharpt. Fulyconetdarisfuhwmxp theoracivnfuspbl .

Overall Acirleoftangudswpbk: Foreachinputmg,wefirstpahougcnvly.The convledrsutafih(thatis,CONV+ReLU). Theobtaindcvmpshgryxlfu,tha is,POOL.Thepolingwrsutamzfhcd thenumbrofas. CONV(+ReLU)andPOOLlayerswibptdfmoh conetdhfulyars.Thisncreathdpofwk whicnreastpblyofmdgx.Also,diferntlvso filtersanhd'shieraclpntodfv.Pleasrf ot $IBQUFS, repsntaiolg. Theoutplayrisfncd,butwihsofmaxncelp computehrbaily-likeoutp. formedtailsbupnwk

[ 75 ]

Theoutpishncmardwgvl whicaretnusdomp.Usualy,thelosfuncida themansqurdoipzg . Errosaethnbckpgdufivl. Totakedp-diventoCovNetapliconsfrmuv,pleasrfto ,

$IBQUFS

fordetails.

Restricted Boltzmann Machines ARBMisaneurltwokhy:thevisblayrnd. Eachvisblenod/neuroisctdah.Restriconmeah intra-layercomunit,thais,theranocisbwvl-visble nodesrthi-hidenos.Itwasonefthrlimdbuc areofticlngdhsbpuym dimensoaltyruc,clasifton,featurlnig,andomlyetci. Thefolwingurshtbac:

It srelativyoxpRBMinamtheclforsjuw parmets: Thewightmarx ( visbleandtho.Eachentry isthewgofcn betwnvislod Twobiasvectrfhlyndp ( elmnt corespndthbiavluf corespndthbiavlufy,witheaclmn corespndigth

)describthongw andhi e o

.

), thvisblenod.Similary,vector thnode.

[ 76 ]

Comparedtocnulwks,therasomnicbldf: RBMisagenrtv,stochaineurlwk.Theparmtsdjuo learnpobitydsuvhf. RBMisanergy-basedmol.Thenrgyfuctiopdsalv whicbaslyorepndtfgu modelbingthacfur. Itencodsutpibarym,notasprbile. Neuralnetwoksypfmighdbc,butRBMs use contrastive divergence(CD).WewiltakbouCDinmoredtalh folwingsect.

Energy function RBMisanergy-basedmol.Thenrgyfuctiopdsalvh probailtyfhemdsncgu. FromGeofreyHinto'stuorial( ),thenrgyfucioswal:

Thecalutionsmp.Basicaly,youdtheprcbwnias corespndigut(visbleorhdn)tocaluehirnbgy functio.Thethirdmsngypaofcbwvl nodesa th i . Duringthemodlasybz,thais,modelparts( and , )beingupdathrcoflwy.,

Encoding and decoding ThetraingofRBMcanbethougfswp,theforwadncigp (construci)andthebckwroig(reconstrui).Inanuspervidtg, wherliktoansmdbufp,theforwadn backwrdpseonfl .

[ 77 ]

Inaforwdps,therawinpuvlfomd(forexampl,pixelvausfromn image)arepsntdbyhvilo.Thenthyarmulipdwgs andewithbsvlu(note,thevisblaunodfrw pas).Thersultingvapdhocfb outp.Iftherafolwingyscd,thisacvonreulwbdp movef rwa d:

InoursimpleRBMcase,whicasonlyedvbr,theacivon valuesofthidnyrbcmpkw.Theyarmultipdb thewigmarx,througedsfwi,andpoultebckwrsh visblenod.Ateachvisblnod,altheincomgvusrdp thevisblau(note,theidnbasvluockwrp):

[ 78 ]

SincethwigsofRBMarendomizthbg,inthefrswoud reconstui,whicsomputedbyrnval value,canbelrg.Sousaly,itnedsafwromzuchl erominusachd.Theforwadnthbckpslm jointprbalydsuf andctivoresul(astheoupf y )hidenlayr .ThiswhyRBMisthougfaenrvlm. Thequstionwhpdarkm. First,erosacmputdingKL-divergnc.TolearnmobutKL-divergnc, readscnfto

ofthebk byDavidMacKay( 2018).Basicaly,itcompueshdfrnwbyakgl diferncbtwhsuo.MinimzgKL-divergncmastopuh learndmoistbu(intheformacivluspd layer)towardsheinpub.Inmanydeplrigoths,gradient descntiu,suchatoigrden.Howevr,RBMisungamethodf aproximtedu-likehodarng,caled

IUUQXXXJOGFSFODFPSHVLJUQSOOCPPLQEG,linkastchedJan.

contrastive divergence.

[ 79 ]

Contrastive divergence (CD-k) Contrasivedgcbhouf algorithm.Itcomputeshdivrgn/diferncsbtwhpova(energyof firstencodg)andegtivphs(energyofthlascdi).Itisequvalnto minzgtheKL-divergncbtwhmodlsua(empircal)dat distrbuon.Thevaribl practie,

learnig

isthenumbrofycavdg.In semtowrkupingly.

Basicaly,thegradins phaseocitdgrn,andegtivphsocr.Thepositvand

usingthedfrcbwopa:positve

negativrmsdoflchbu probailtydsune.Thepositvacdgrnhbly oftraingd(byreducingthospf),whiletscondrm decrasthpobilyfmgn.Apseudocnit TensorFlowcanberitsf: %FGJOF(JCCT4BNQMJOHGVODUJPO EFGTBNQMF@QSPC QSPCT  SFUVSOUGOOSFMV UGTJHO QSPCTUGSBOEPN@VOJGPSN UGTIBQF QSPCT IJEEFO@QSPCT@TBNQMF@QSPC UGOOTJHNPJE UGNBUNVM 98  IJEEFO@CJBT WJTJCMF@QSPCTTBNQMF@QSPC UGOOTJHNPJE UGNBUNVM IJEEFO@ UGUSBOTQPTF 8  WJTJCMF@CJBT IJEEFO@QSPCT@UGOOTJHNPJE UGNBUNVM WJTJCMF@QSPCT8  IJEEFO@CJBT QPTJUJWFBTTPDJBUFEHSBEJFOUTJODSFBTFTUIFQSPCBCJMJUZPGUSBJOJOHEBUB X@QPTJUJWF@HSBEUGNBUNVM UGUSBOTQPTF 9 IJEEFO@QSPCT@ EFDSFBTFTUIFQSPCBCJMJUZPGTBNQMFTHFOFSBUFECZUIFNPEFM X@OFHBUJWF@HSBEUGNBUNVM UGUSBOTQPTF WJTJCMF@QSPCT IJEEFO@QSPCT@ 88 BMQIB  X@QPTJUJWF@HSBEX@OFHBUJWF@HSBE WCWC BMQIB UGSFEVDF@NFBO 9WJTJCMF@QSPCT ICIC BMQIB UGSFEVDF@NFBO IJEEFO@QSPCT@IJEEFO@QSPCT@

Inthecodsnipabv, sotheinpu

9isthenpuda.Forexampl,MNISTimageshv784pixels 9isavectorf784entrisadcogly,thevisblayr784nodes.Also

notehaiRBMtheinpudascobry.ForMNISTdat,onecaus-hot encodigtrasfhpuxlv.Inaditon, biasofthevlyr , samplingfucto tourn .

BMQIBisthelarng, 8isthewgmarx.The

ICisthebaofdnlyr,and TBNQMF@QSPCistheGibs-Samplingfuctoadeswh

[ 80 ]

WCisthe

Stacked/continuous RBM Adep-belifntwork(DBN)ismplyafewRBMsstackedonpfhr.The outpfrmhevisRBMbecomsthinpuflwgRBM.In2006,Hinto prosedaft,gredyaloithmnsp: thacnlerdp,directblfnwoksaym.DBNlearnshic repsntaiofudmch,therfoisvyul, espcialynurvdtg .

,

Forcontiusp,onecarfthmdlius Boltzmanchies,whicutlzeadfrnyposvgm. Suchmodelsanwitgpxrvzb andoe .

RBM versus Boltzmann Machines Boltzmann Machines(BMs)canbethougfsprilm-linearMarkov randomfiel,forwhictengyuslapm.Toincreasth repsntaiocyfmldbu,onecasidrth numberofvailsthd,thais,hidenvarbls,orinthscae,the hidenuros.RBMsarebuiltonpfBMs,inwhctersoapldf novisble-visbleandh-hidencots.

Recurrent neural networks (RNN/LSTM) Inaconvlutierwkypfd,theinformalws througaseifmclpndwbk lopranycsidetfhg.Therfo,theyarnocpblf handligputswcomeq . Howevr,inpracte,wehavlotfsqunidcmr dat,whicnludetx,genoms,handwritg,thespoknwrd,ornumeicalts-seri datemnigfros,stockmare,andgovermtcis.Itisnotlyhe ordethams.Thenxtvaluiofgrydpshc (longrsht).Forexampl,topredichnxwaslfm requid,notjusfrmwdeaby,butsomeihfrwdnc. Thishelptoubjcand.

[ 81 ]

The recurrent neural network(RNN)arenwfomticlukd designpcfalyorth.Ittakesinocuhqrd(this sequncabofritylgh)anditerlopswhcu,meanig anycofigurt/staeofhnwrkimpcd,notlybhecuripas bytheircnpas .

Cells in RNN and unrolling Allrecuntawoksbhgfipmd/celsinth dimensoft.Thisrepatngmodul/celansimpybgthr.Onewayto understahiol,orunavel,thearciunomsp,andtrech timespalyr.WecansethdpofRNNisentalydcbh lengthofimsprquc.Thefirstlmnohquc,such asthewordfnc,isequvalntohfry. Thefolwingurshtcm:

Backpropagation through time Infedorwantks, outplayerndhmvigbckws,layerb.Ateachstp,it caluteshpridvofwg .Thenthroug optimzanrch(forexampl,gradientsc),thosedrivauj thewigsupordnca.

backpropagation(BP)starwihclungefo

[ 82 ]

Similarynecutwoks,afterhunolwksgim,BPcanbe thougfasnexivrmd,thiscaled timeor BPTT.Thecalutionsvrym,onlythaesrifb replacdbysiofmnth .

backpropagation through

Vanishing gradient and LTSM Similartodepchus,thedprnwoksg,themorsvanig gradientpoblms.What networkchaglsd.Giventhawork'sweightarndomly, withno-movingwehts,wearlnigvytfomhd.Thiso-caled problemasfctRNN. shapenigtwbof

EachofteimspnRNNcanbethougfslyr.Then,duringbackpot, erosaginfmtphvu.Sothenworkcabugfs beingasdpthumrof.Inmanyprctilobes,suchaword sentc,parghs,ortheim-seridat,thesquncfdioRNNcanbevry long.TherasonthRNNisgodatequnc-relatdpobmsihyg retaingmpofhvusd

contex informatdyhecup.Ifthesquncarvylog,andthegris computedringa/BPTTeithrvans(asareultofmipcn0< values

Thenwcausthfoligdrp: TFTTUG4FTTJPO QSJOU TFTTSVO OPEF TFTTSVO OPEF

Theoutpfhrcdingsalw: TFTTSVO OPEF 

Handwritten digits recognition Thechalngofdwritszm handwritegs.Itisueflnmaycro,forexamplcgnizds envlops.Inthisexampl,weilusthMNISTdatseovlpnur neuraltwokmdfhigc .

[ 86 ]

MNISTisacomputervndh: consitfgraylemhdwb.Each images28pixelsby28pixels.Sampleiagsrhownf:

IUUQZBOOMFDVODPNFYECNOJTU.It

TheMNISTdatisplnohre:55,000imagesoftrnd,10,000imagesof tesda,and5,000imagesofvldtn.Eachimagesopndbytl, whicrepsntadg.Thegoalistcfyhmnd,inotherwds, asocitehmgwnfl . Wecanrepsthimgu1x784vectorflaingpumbsw0and 1.Thenumbr784isthenumbrofpxl28x28image.Weobtainhe1x784vector byflatenigh2Dimagento1Dvector.Wecanrepsthlb1x10vectorf binaryvlues,withoneadlymbg1,thersbing0.Wearegoint buildaeprngmosTensorFlowtpredich1x10labevctorginh1 x784datvecor. Wefirstmpoheda: GSPNUFOTPSGMPXFYBNQMFTUVUPSJBMTNOJTUJNQPSUJOQVU@EBUB NOJTUJOQVU@EBUBSFBE@EBUB@TFUT ./*45@EBUB POF@IPU5SVF

WethendfisombaculgkrCNN: Weights: EFGXFJHIU@WBSJBCMF TIBQF  JOJUJBMJ[FXFJHIUTXJUIBTNBMMOPJTFGPSTZNNFUSZCSFBLJOH JOJUJBMUGUSVODBUFE@OPSNBM TIBQFTUEEFW SFUVSOUG7BSJBCMF JOJUJBM

Bias: EFGCJBT@WBSJBCMF TIBQF  *OJUJBMJ[FUIFCJBTUPCFTMJHIUMZQPTJUJWFUPBWPJEEFBE OFVSPOT JOJUJBMUGDPOTUBOU TIBQFTIBQF SFUVSOUG7BSJBCMF JOJUJBM

[ 87 ]

Convoluti: EFGDPOWE Y8  'JSTUEJNFOTJPOJOYJTCBUDITJ[F SFUVSOUGOODPOWE Y8TUSJEFT QBEEJOH 4".&

Maxpoling: EFGNBY@QPPM@Y Y  SFUVSOUGOONBY@QPPM YLTJ[F TUSJEFTQBEEJOH 4".&

Nowebuildthnraokmgcps buildngocks.Ourmodelcnsitfwvuayhpg andfulycoetrh.Thegraphcnbilustdfow:

Thefolwingcdmptshvuark: YUGQMBDFIPMEFS UGGMPBUTIBQF Z@UGQMBDFIPMEFS UGGMPBUTIBQF HSPVOEUSVUIMBCFM 'JSTUDPOWPMVUJPOMBZFS 8@DPOWXFJHIU@WBSJBCMF C@DPOWCJBT@WBSJBCMF GJSTUEJNFOTJPOPGY@JNBHFJTCBUDITJ[F Y@JNBHFUGSFTIBQF Y I@DPOWUGOOSFMV DPOWE Y@JNBHF8@DPOW  C@DPOW I@QPPMNBY@QPPM@Y I@DPOW 4FDPOEDPOWPMVUJPOMBZFS 8@DPOWXFJHIU@WBSJBCMF C@DPOWCJBT@WBSJBCMF I@DPOWUGOOSFMV DPOWE I@QPPM8@DPOW  C@DPOW I@QPPMNBY@QPPM@Y I@DPOW 'VMMZDPOOFDUFEMBZFS

[ 88 ]

8@GDXFJHIU@WBSJBCMF C@GDCJBT@WBSJBCMF I@QPPM@GMBUUGSFTIBQF I@QPPM I@GDUGOOSFMV UGNBUNVM I@QPPM@GMBU8@GD  C@GD

Wecanlsoreduvfitgp: LFFQ@QSPCUGQMBDFIPMEFS UGGMPBU I@GD@ESPQUGOOESPQPVU I@GDLFFQ@QSPC

Wenowbuildtheasyr,theradouly: 8@GDXFJHIU@WBSJBCMF C@GDCJBT@WBSJBCMF 3FBEPVUMBZFS Z@DPOWUGNBUNVM I@GD@ESPQ8@GD  C@GD I@GD@ESPQUGOOESPQPVU I@GDLFFQ@QSPC

Nowedfinthcosuargpm: DSPTT@FOUSPQZUGSFEVDF@NFBO

UGOOTPGUNBY@DSPTT@FOUSPQZ@XJUI@MPHJUT MBCFMTZ@MPHJUTZ@DPOW USBJO@TUFQUGUSBJO"EBN0QUJNJ[FS F NJOJNJ[F DSPTT@FOUSPQZ

Next,wedfinvaluto: DPSSFDU@QSFEJDUJPOUGFRVBM UGBSHNBY Z@DPOW UGBSHNBY Z@ BDDVSBDZUGSFEVDF@NFBO UGDBTU DPSSFDU@QSFEJDUJPOUGGMPBU

Lastly,wecanfilyruthgpso: XJUIUG4FTTJPO BTTFTT TFTTSVO UGHMPCBM@WBSJBCMFT@JOJUJBMJ[FS GPSJJOSBOHF   CBUDINOJTUUSBJOOFYU@CBUDI  JGJ USBJO@BDDVSBDZBDDVSBDZFWBM GFFE@EJDU\ YCBUDIZ@CBUDILFFQ@QSPC^ QSJOU TUFQEUSBJOJOHBDDVSBDZH  JUSBJO@BDDVSBDZ USBJO@TUFQSVO GFFE@EJDU\YCBUDIZ@CBUDI LFFQ@QSPC^ QSJOU UFTUBDDVSBDZH BDDVSBDZFWBM

GFFE@EJDU\ YNOJTUUFTUJNBHFT Z@NOJTUUFTUMBCFMT LFFQ@QSPC^

[ 89 ]

Atthend,weachiv99.2%acuryonthesdfiMNISTdatseuing simpleCNN.

Summary Inthiscaper,westardihbcmulyponk.Fromther,we havetlkdbousicr,suchateinp/outplayerswvi typesof

.Wehavelsogindtpwrk withefocusnbakprgdm.Withes fundametlsi,weintroduchypsflak:CNN,Restriced Boltzmanchies,andrecutlwoks(withsvaron,LSTM).Foreach particulnewoky,wegavdtilxpnosfrhkybuc architeu.Atthend,wegavhnds-onexamplsiutrfgTensorFlow foraned-to-endaplicto.Inthenxcapr,weiltakboupcnsfr networksicmpuv,includgpoaretwkhs,bestpraci,and realwokxmps .

[ 90 ]

Deep Learning in Computer Vision Intheprviousca,wecovrdthbasifnulk apliedforsvngc

artificial intelligence(AI)task.Asoutlinedhcapr,one ofthemspulardnigbyc visonacluterwk,alsoknwCNN.Thischaptermov CNNsinmoredtal.Wewilgovercnptshakf CNN,andhowteycbuslvr-worldcmputevisnb.Wewil specifalynwrthogqu : HowdiCNNsorignatedwhsclf? WhatcorenpsfmhbiudgCNNs? WhatresomfhpulCNNarchiteusnody? HowdoyuimplentbascfCNNusingTensorFlow? Howdoyufine-tuneapr-trainedCNNanduseityorplc?

Origins of CNNs WalterPitsandWarenMcCulochareftndiwsmp 1943,whicasnpredbytulok-basedtrucofhmni.They prosedatchniquflg-basedignprov formalisundewhctvFiniteAutoma. The McCulloch-Pittsnetworkasdicgphu edgswrmakithxcoy(1)orinhbty(0),anduse replicathumnogs .

ot

Oneofthcalngsidwr,aswouldbe definlatr.HenryJ.Keleyprovidthfsnagm formacntius ArthurBryson.The orignalbckptmde.Thoughtemdlanriws earlyon,theirnfcyldoapbsmu.

backpropagation modelin1960folwedbyanimprvt chain rulewasdevlopbyStuartDreyfusaimplctonh

ThearlistwokngmpfdhcIvakhneo andLapain1965.Theyusdmolwithpnacvf,whicer furthesaiclynzd.Fromeachlyr,theyslcdaibfurn forwadeithnxly,whicasoftenldmupr.Intheir1971 paer,theyalsodcribpnuwkm

alpha,whicadegt

ayerstindbhl group method of data handlingalgorithm.Howevr,noefths sytemwapriculdfohnvk.Thearlist inspratofhlewkcmHubelandWiesl[5,6]inthe1950sand1960s. Theyshowdtviualcrxnbmkp indvualytosmreghf,alsoknwthe Oneofthkybsrvainmdwgcl overlapingctfds,andsuchreptivflox. Theyalsodicvrthnuxmpf

receptive field

and

.Simplecsrondtaighvfw theircpvflds.Complexcs,ontherad,arefomduthpjcin variousmplec.Thoughteyrspndamif corespndigml,theyingraspoflmcvwd recptivfld.Thiscaueomplxtbrn-invartoseh exactloinfhdgsrpv.Thisoneftharcupl behindtsgamplofCNNsinpracteody. Thefirstal-worldsyteminpbhkfHubelandWieslwa neocognitron, devlopbyKunihkoFukushima.Neocgnitrsfedah firstCNNimplentaohrwd.Neocgnitr'sprimayfocutlen handwritegsfomz.Inthisparculdegn,neocgitrsdf nielayrswhcotdfgup:slayerofimpc, andlyerofcmpxs , C-cells.Eachlayerisfutdvnogmb planes,howevracplnitysmubf,alsorefd toas processing elements. Eachlayersdifntumbop complexs.Forexampl,U S1 has12planesof

S-cells,

19 x 19simplec.

[ 92 ]

Thecortlasifnpudgmbyhw therigmosC-celayr.Thesarlymodpvingtf matheiclodngfvsurx.Howevr,learnigthpmsofuc modelswaticnrfuk.Itwasnotuil1985whenRumelhart,Wiliams, andHintoapliedbckrg-basedtchniquolrwk.They showedtauingbckpr,neuraltwokscigdb repsntaio.ThesucofbakprgtindwhYanLeCun demonstrahpcilfbkgw recognitfmbakhs.Thesadvncmolrputiw thanwsvilbedurgm.Howevr,notlgafer,thedvlopmnf modern-day graphics processing units(GPU)inthearly2000sincreasdomput spedby1000times.Thispavedthwyforl-worldapictnfeCNN modelsnargtfiywk.

Convolutional Neural Networks Welearndfomthpviuscwk,whic havewigtsndblro.Thisnetworkgazd layerswhciompdfnubt.Neuronsieach layercontdusihxgfw thaislerndfomg.Eachneuroalsp-selctdaivon functio.Forevyinputcs,aneurocmptsidwhl weightandpsroucvf . Thoughtisarcewklfm-scaledt,ithascleng:

[ 93 ]

Imagineyourtmcshlwk.The inputmagesr32x32x3,meanigthyvrcols,red,gren,andblue (RGB),andechlimgs32pixelswdan32pixelshg.Ifyouwertinp thismagendfulycorx,eachnurowil have32x32x3=3072edgsorwiht,ashownite figure.Tolearngodwihtpsvc,youwldne lotsfdancmpuewr.Thiscalehngwropymfu thesizofanmgr32x32to,forexampl,200x200.Notonlywihspecmu chalengs,thisparmexlonwvbyd comnahielrgptf.

overfitting,whicsavery

CNNsarespciflydgntovmhb.CNNsworkelifthnpus themavypiclgrd-likestruc,asfoundimge.Unlikerguantwos, CNNsorganizethpud-dimensoaltr-likestruc eprsntig onelayrisctdpvghumfx. Thisenurthamboflyc,eachnuroslift locatin(figure toreduchigmnsalpvf:

width,height,and

depth.Toprevntamxlosi,eachvolumin

).Finaly,theouplayrisb

Oneofthmsiprand,afterwhicCNNisnamed,istheconvlupra. Aconvlutiperabdshfw-valuedsign. Todescribtanxmpl,let'sayouhvelkwitcrmf.You decitopkuarfmhsnw.Whenthisrock surfaceothw,itcreasplohkgnfm oftherckwia'surface.Intermsofcnvlui,youcanexplithsr efctashrulonvipkw.Theprocsf convlutimeasrhpfgwd . Oneofitsprmaylcndg.

[ 94 ]

Onesuchxamplinvrgftoyd.Sometis, whentimagscpurdvy,youmightwandblrefc, alsoknwthe

averaging filter.Convolutisfehmcy-usedtol achievtsf.Asshownite left ( input data),whencovlditamrx( alsoknw feature map:

figure,themarixon kernel)ontherig,genratsoup

Mathemaicly,convlutiabedfsw:

Here,

isthenpuda,wisthekrnl,and

isthefaurmp.

Data transformations Often,inayrel-worldimpentafCNN,datprocesingfmky stepoachivngdury.Inthisecon,weilcovrsmbautpn datrnsfomiephclyu .

[ 95 ]

Input preprocessing Let'saumethord,X,hasNimagesndchDflatendoupixs. Thefolwingthrpcsauymd

9:

Mean subtraction:Inthisep,wecomputanigrshl datsenubrchimgfo.Thistephafco centrighdaoslfum.To implenthsPython: JNQPSUOVNQZBTOQ NFBO@9OQNFBO 9BYJT DFOUFSFE@99NFBO@9

Normalization:Themansubtrciopflwdyz step,whicastefolngurdm. Thisdonebyvgachfturlm. The theinpuda.ItcanbeimpltdPythonasflw: figurelstahconmz

Thecodfrbtaingmlzsw: TUE@9OQTUE DFOUFSFE@9BYJT OPSNBMJ[FE@9DFOUFSFE@9TUE@9

[ 96 ]

PCA whitening:Oneothrimpansfudl networks,ingeral,iswhtengu AlthougismednwlyCNNs,itsanmpore worthdescibng.Whitengcabudrsohpw de-corelatdbympuinghvxs dimensoaltyfhgprc,asdeir. The step.ItcanlsobeimptdPythonasflw: figure

Principal Component Analysis(PCA).

showtegmricnpaf

$PNQVUFUIFDPWBSJBODFNBUSJYGSPNUIFDFOUFSFEEBUB DPW@NBUSJYOQEPU DFOUFSFE@95DFOUFSFE@9 DFOUFSFE@9TIBQF 1FSGPSN4JOHVMBS7BMVFE%FDPNQPTJUJPO 647OQMJOBMHTWE DPW@NBUSJY $PNQVUFUIFXIJUFOFEEBUBXJUIPVUEJNFOTJPOBMJUZSFEVDUJPO EFDPSS@9OQEPU DFOUFSFE@96 EFDPSSFMBUFUIFEBUB $PNQVUFUIFXIJUFOFEEBUB XIJUFOFE@9EFDPSS@9OQTRSU 4 F

Data augmentation Oneofthmscnrikpvgau theraingdlwy.Theramultipsgochvf: Translation and rotation invariance:Forthenwoklasi asrotinvce,itsofenugdamr imageswthdfrnpcvo.Forinstace,you cantkeipumgdflhorzys. Alongwithrzalfps,youcantrslehmbfwpixg posibletranfm.

[ 97 ]

Scale invariance:OneofthlimansCNNistnefcvo recognizbjtsadfl.Toadresthiocmng,itsofena godieatumnhrswcpf . Thesrandomcptub-sampledvrionftg.Youcan alsotkehrndmcpu-samplethorignd widthofenpumag . Color perturbation:Oneofthmrinsgda pertubinghcolvasfmdy .

Network layers Asintroducepvsha,atypiclCNNarchiteuonsfly, eachofwitrnsmpug.Eachofteslayr maybelongtrhcs.Eachlasofyerpiunt network.The figureshowanxmplctk,whic iscomp ed f input layer,convolutional layer,pooling layer,and (FC).AtypicalConvNetcanhvrius[INPUT->CONV->POOL->FC]. Inthisecon,weildscrbahoftynmgv andsigfceormp:

[ 98 ]

fully connected layer

Convolution layer OneofthcrbuildngksCNNs,theconvluiayrspbfg aspecifonvlutrmg.Thisflterapdonchub-regionf theimag,whicsfurtednbyloavpm.Each filterapcondusvx,whicenombd acroslpixetnfd

feature map. Forexampl,ifyouseght filtersocnva32x32imagetvrysnlpxoc,youwilprdce12outp featurmpschoiz32x32.Inthiscae,eachofturmpswilbd corespndigtaulvf.The ilustraehconpmd . figure

Oneofthimpranqusdc,howdesnca particulfeonvhmgw?Toanswerthiquo,thisfler,infact,an actulernbpmofhdsvgi . Assuch,thedsignoflrbcmaxypu perfomancthwk.Atypicalfermbvxshgndw. Howevr,acombintfsuhlerpdwv potenfaurdc.Thetraingpocsfuhlvym.First, youdecithnmbraszflpv network.Duringthesaofpc,staringvluefohc randomly.Duringthefowadpsbcklm,eachfiltrs convledatrypsibxuhmgf.Thes featurmpsnowchigbqlyd extracionfhglvmusw. Oneimportanshcugfvlyx locatinsmpuyefdr.Forexampl,ifthenpuvolmas asizeof[32x32x3],andthefilrsz5x5,thenacuroivly wilbecontda[5x5x3]regionthpuvlm,genratiolf5*5*3= 75weights(and+1biasprmet).Toreducthispamxlon,wesomtiu aprmetfdos

stride length.Stridelngthosapbw subeqntfilrapco,therbyducingszofp signfcatly .

[ 99 ]

Oneofthps-procesinglayftdhvu eht

Rectified Linear Unit(ReLU).ReLUcomputeshfni theadvngsofReLUisthagrelyconvfm sucha

.Oneof

stochastic gradient descent(SGD):

Pooling or subsampling layer ApolingrsubamyeftdwcvCNN.Its roleistdwnamphufcvygb ofheigtandw.Forexampl,a2x2polingeratf12featurmpswil produceantsfiz[16x16x12](seth figure). Theprimayfunctolgsdhb bythenwork.Thisalohtednfcrugvby increasgthovlpfmduywk . Theramultipchnqsodg.Someofthscnplig techniqusar: Max pooling:Inthiscae,afeturmpochld(2x2inthe previousxaml)isreplacdbyngvu,whicste fourvalesindthp

maximumofthe

[ 100 ]

Average pooling:Inthiscae,afeturmpochld(2x2inthe precdingxaml)isreplacdbyngvu,whicste

averageoftheur

valuesindthpor Ingenral,apolingyerctshfw: Inputvolmefsiz: Requirestwopam: Theirspatlxn Thestrid Producesavlmfiz

wher:

Fully connected or dense layer OneofthinalyrsCNNisoftenhulycdar,whicsalokn dense layer.Neuronsithlayefcdvp layer.Theoutpfhislayrc,whertnumbofsi layerqusthnmbofcpi .

[ 101 ]

Usingthecombaflyrspvud,aCNNconvertsaipumg tohefinalcsr.Eachlayerwoksindftpm requimnts.Theparmtsinhlydougc-based algorithmnbckpwy .

Network initialization Oneofthmsinglyrva,yetcruial,aspectofCNNtraingsewok intalzo.EveryCNNlayerhsctinpmowgdv traingse.ThemostpulargihnwSGD.InputsoSGD includeatsofwgh,alosfuncti,andlbetrig.SGDwiluseth intalweghsocmpuvbrdj weightorducls.Thisadjutewghlnobfxr theprviouscnlgad.Ascanbesfromthi proces,thecoifnalwgrkzpysu qualityndspeofcvrgwk.Hence,anumberofstgihv benaplidtorshu.Someoftharslw: Random initialization:Inthiscem,alweightsrndomy valueinty.Onegodpractiwhnmsu samplecofrznduitvGausian distrbuon.Theidabhnromztspl hadverysimlontcwgu,evrynuowil computexalyhsvndkgri iteraon.Thismeanvryuowlftdhk won'tbedivrsnoughlapfm.Toensur diverstynawok,randomweightsu.Thiswouldenrghta asignedymtrcl,leadingtovrsbyhwk.Onetrick withrandomlzugsekvc.Ifyoudefinwghts randomly,thedisrbuonfpmwlavcg varince.Onercomndatislzhwgubf inputsohelayr. Sparse initialization:UsedbySutskevral., neuroadmlyctiKneurosithpvlay. Weightsforeacnudmlyb.Atypical numberof is10-15.Thecorintubhdsapl numberofctisahpvly. Itisoftenagdlzhb0inthsparcule.Incaseof ReLU,youmightwancsel0.01toensurm gradientspofw.

imagnefthwork

inthscem,wechosa

[ 102 ]

Batch normalization:InventdbyIofeandSzegdy,batchnormlizs beroustplmfnwkiaz.Thecntralidhs schemitofrwlnkazuGausian distrbuonahefgp.Ittakeswoprm, andgertsbchomlizv: ofinput as:

and

,

-0.5to

+0.5

Regularization OneofthcalngsirCNNsisoverftng.Overfitngcabds phenomwrCNN,oringealythm,perfomsvywlin optimzngrae,butisnoalegrzwd.Themostcn trickusednhomya

regularization,whicsmplyadng apenltyohsfucibgmzd.Theraviouswyfglznth network.Someofthcniqusarxpldw: L2 regularization:Onethmospularfgizn,anL2 regulaizmpntsqdyohw,meanig,theigr thewigs,theigrpnaly.Thisenurocthwkad,the valueofptimwghsr.Intuively,thismeanwork havingsmlerwtuofpydb diversf.Havingherwtsaoklmdp morenuswithg,evntualymkigbsd.The figurelstahcvyw,afterL2regulaizton, yousemrwightcnad beforgulaiztn.

-0.2to

[ 103 ]

+0.2asopedt

L1 regularization:OneofthisuwL2regulaiztonshv thersulingwam,theyarmoslpiv.Thismeanth networkisagmulpvfh.Thisbecoma problemwhnyuadigts.Youwantocmpley elimnatkgosypud,youwldiketpac0weighton suchinpt.ThispavethforL1regulaizton.Inthiscae,youad first-ordepnaltywighsfc-ordepnaltysuchL2.The resultingfcohazb figure.Youcansefwrightbo-empty,sugetinha networkhasldpig,whicaremobustnyp.You canlsombiethL1andL2regulaiztonschm, elastic net regularization. Max-norm constrained regularization:Inthiscem,youcnstraihe maxiuposblenrfwghtvc-specifdvalu p sucha

whicsalorefdt

.Thisenurthawokgdply boundeatpfcrshlig thenwork. Dropout regularization:Oneofthrcnadvsigulzp. Theidahrstoupm, ,whicdefnsaproblty younlseactivfrhx.The figureshowanxmpl.Withadropume of andfoures,yourandmlsectw(0.5*4)whose activonwlbefrdhxy.Sinceyouardpigt activonsdurg,younedtscalhivprw thaesingprmucd.Todothis,youalsdn

,

inverted

dropout,whicsaletvonbyfr :

[ 104 ]

YoucandroptlyeiTensorFlowusingthefcd: ESPQPVU@SBUF GDUGMBZFSTESPQPVU GDSBUFESPQPVU@SBUFUSBJOJOHJT@USBJOJOH

Loss functions Sofar,wehavsnCNNistranedugc-basedlgorithm minzealosfuctgvhrd.Theramultipwysodfnh chosetilfun.Inthisecon,weilokatsmfhcnyud losfunctiedrCNNtraing: Cross-entropy loss: Thisonefthmpularydc CNNs.Itisbaedonthfcr-entropy,whicsameurofdtn betwnarudiso andestimrbuo ,andcbefi as

, .Usingthsmeaur,thecros-entropylsca

bedfinasolw:

[ 105 ]

Hinge loss: Hingeloscabimpydfw:

Letusnderahilofcwxmp.Let'saumewhvtrcl, andforgivetpCNNoutpshrecfalin folwingrde:[10,-5,5].Letusalomehcrfidpn firstclandhevuo is10.Inthiscae,theingloswudbcmpaf:

.Thisemntuv

Asshownprecdigly,theoalsfunciwdmp sincethlovaumrbg, case.

inths

Model visualization OneofthimpranscCNNisthaonce'strained,itlearnsofu mapsorfilte,whicatsfeurxonlmg.Assuch,itwouldbe greatovisulzhfndwk througisan.Fortunaely,thisagrownefld techniqusamkrovlzdfby CNN. Theratwo primaytsofhenwkgvulz : Layer activation:Thisthemocnfrwkvualz onevisualzthcfrdgwpk. Thisvualztonmprfe: Italowsyutehcrndfip image.Youcansethiformgqlv understaigofwhlp.

[ 106 ]

Youcanesilydbgthworkfm arelnigyusft,oraesimplybnkg sugetindrwoka The . folwing

figureshowtpn moredtail:

[ 107 ]

Filter visualization:Anothercmnusafvilz theaculfirvsm.RemebrthaCNNfilterscanob understo a feature detectors,whicenvsualzdmotr kindofmageturchlx.Forexampl,the precding

figurelstahCNNfilterscanb

trainedocxgsfwl colrmbinats.Noisyfltervaucnobdgh techniquoprvdafbkglyw :

Handwritten digit classification example Inthisecon,weilshotmpnaCNNtorecgniz10clashndwrite digtsunTensorFlow.Wewiluseth 60,000traingexmplsd10,000tesxamplofhndwrigz, wheracimgs28x28-pixelmonchrag.

./*45datseforhiclng,whiconstf

[ 108 ]

GFBUVSFT variblendsth

Letusamelfrpnih varible.Webeginwthmporcsaykdulf thepr-loade

MBCFMT

GFBUVSFTvarible: JNQPSUOVNQZBTOQ JNQPSUUFOTPSGMPXBTUG JNQPSUNOJTU NOJTUUGDPOUSJCMFBSOEBUBTFUTMPBE@EBUBTFU NOJTU GFBUVSFTNOJTUUSBJOJNBHFT3FUVSOTOQBSSBZ *OQVU-BZFS */165UGSFTIBQF GFBUVSFT

Wewiluseantorkchfvy,twopling layers,andtwofuly-conetdlayrsihfwg:[ $0/7-> 100--> '$-> '$].Weuse32filtersachoz5x5for 100-withosrde.ItismplentdTensorFlowasf: for

*/165-> $0/7-> 100--> $0/7and2x2filters

$0/7UGMBZFSTDPOWE

JOQVUT*/165 GJMUFST LFSOFM@TJ[F QBEEJOHTBNF BDUJWBUJPOUGOOSFMV 100-UGMBZFSTNBY@QPPMJOHE JOQVUT$0/7QPPM@TJ[FTUSJEFT

Weuse64filtersoz5x5for alsocnethyrpviu,asfolw:

$0/7and2x2

100-filtersagnwhod.We

$0/7UGMBZFSTDPOWE

JOQVUT100- GJMUFST LFSOFM@TJ[F QBEEJOHTBNF BDUJWBUJPOUGOOSFMV 100-UGMBZFSTNBY@QPPMJOHE JOQVUT$0/7QPPM@TJ[FTUSJEFT

[ 109 ]

Theoutpf nedtocasrfuly-conetdlayr.Onceflatnd,wecontiafulyconetdlayrwih

100-isatwo-dimensoaltrx,whicnedstobflau neuros:

100-@'-"55&/&%UGSFTIBQF 100- '$UGMBZFSTEFOTF JOQVUT100-@'-"55&/&%VOJUT BDUJWBUJPOUGOOSFMV

Toimprovethnwkag,wendtoargulizschm.Weusea andcoetifulyr.Finaly, neuros-oneurfachdigtls:

droputlayewihf thislayercondfw

%301065UGMBZFSTESPQPVU

JOQVUT'$SBUFUSBJOJOHNPEFUGFTUJNBUPS.PEF,FZT53"*/ '$UGMBZFSTEFOTF JOQVUT%301065VOJUT

MPTTfunctioadsr

Nowthaenorkisfulycgd,wendtofia traing.Asdescribpngly,wechosr-entropyls,asfolw:

$BMDVMBUF-PTT GPSCPUI53"*/BOE&7"-NPEFT POFIPU@MBCFMTUGPOF@IPU JOEJDFTUGDBTU MBCFMTUGJOU  EFQUI MPTTUGMPTTFTTPGUNBY@DSPTT@FOUSPQZ POFIPU@MBCFMTPOFIPU@MBCFMT MPHJUT'$

Wenowsetuplarigmfdc: $POGJHVSFUIF5SBJOJOH0Q GPS53"*/NPEF PQUJNJ[FSUGUSBJO(SBEJFOU%FTDFOU0QUJNJ[FS MFBSOJOH@SBUF USBJO@PQPQUJNJ[FSNJOJNJ[F MPTTMPTT HMPCBM@TUFQUGUSBJOHFU@HMPCBM@TUFQ USBJO@JOQVU@GOUGFTUJNBUPSJOQVUTOVNQZ@JOQVU@GO

Y\YGFBUVSFT^ ZMBCFMT CBUDI@TJ[F OVN@FQPDIT/POF TIVGGMF5SVF NOJTU@DMBTTJGJFSUGFTUJNBUPS&TUJNBUPS4QFD

NPEFNPEFMPTTMPTTUSBJO@PQUSBJO@PQ NOJTU@DMBTTJGJFSUSBJO JOQVU@GOUSBJO@JOQVU@GOTUFQT

[ 110 ]

Fine-tuning CNNs ThoughCNNscanbesilytrdgvouhmpw, traingh-qualityCNNtakeslofirndpc.Itisnotalwye optimzeahugnbrfs,oftenihragmls,whiletrangCNN fromscath.Moreov,aCNNisepcalyutdorbmwhg.Often, youarefcdwithpblmsngCNNonsuch datsemylovrfing.Fine-tunigaCNNisoneuchtq thaimsodrepflCNNs.Thefin-tunigofaCNNimplesthayounvr trainheCNNfromscath.Instead,youstarfmpevilndCNNmodelan finelyadptchgmowsbrux.This straegyhmulipdvn : Itexploitshargnumbf-trainedmolsyvbf adption Itreducsthompinwkalybf, anditcovergsquklyfhw Itcanlsowrkmedtvifgpy Theramultipwysofn-tunigwhCNNs.Theyarlistdfow: CNN feature extractor:Often,youarefcdwithnmglsk haspecifnumbrol,say20.Giventhisak,onebviusqt is,howdyutakevngfxispr-trainedCNNmodelsthav higas1000clasbenduthmfori-tunig?TheCNNfeaturxco isatechnquwro.Inthisecnqu,wetakpr-trained CNNmodelsuchaAlexNet,whicas1000clase,andremovthlsfuy conetdlayrihsfwk.Wethenprfoma forwadpsechinutmgvl convlutiayers,sucha layer,sucha fc6.Forexampl,ifyouchse6,yourtalnmbefcivs is4096,whicnoats4096-dimensoalfturvc.Thisfeaturvco canowbeusdithyxgmlrf,suchaSVM,to trainsmple20-clasciftonmde.

Conv5orevnthpulimafycd

[ 111 ]

CNN adaptation:Sometis,youwldiketavngf conetdlayrsihwkmvfu.Insuch scenario,youcanreplthsfd-trained networkihyuvsfalcdg numberoftpclasyi

forexampl,20claseinth precdingxaml.Oncethisnworkfgud,youcpverth weightsfromp-trainedwokuslzh. Finaly,thenworkisugbacpd networkighsyupacld.Thismethoda obviusadntgefylcrk,and withlemodfcan,youarebltsp-trainedmolswptrainedwokchuvys.Thistraegylowk withasmlounfrgdcep-trainedwokhsly bentraidwhlgvoumsf . CNN retraining:ThistraegyuflwhnodCNNfrom scrath.Usualy,fultraingoCNNmayspnd,whicsnotveryufl. Toavoidthsmul-daytring,itsofenrcmdalzhwg ofyurnetwkihap-trainedmolshgfyuwk fromthepinw-trainedwokspg.Thisenur evrystpofaingdmlbu wastingprecoumh-learnigodftshvyb learndbyotsfp trainedmols.Usualy,itsadvbleoumr learnigtswhu .

Popular CNN architectures DesignaperfctCNNarchiteunvolsgmfxpd computewr.Hence,itsofen-trivalochepmCNNarchiteudsgn. Fortunaely,anumberofCNNarchiteusxodygnpf manydevloprschtwifgCNNnetworkfm scrath.Inthisecon,weilgovrsmpuaCNNarchiteusknowdy.

[ 112 ]

AlexNet OneoftharliswknpuzgCNNsinlarge-scaleimg clasifton,AlexNetwasprodbyAlexKrizhevskyandtrco-authorsin2012.It wasubmitednryohImageNetchalngi2012andsigfctly outperfmdisn-upwitha16%top-5eroat.AlexNetconsifghlayr thefolwingrd : [INPUT->CONV1->POOL1->CONV2->POOL2->CONV3->CONV5->CONV5-> POOL3->FC6->FC7->FC8]. CONV1isaconvlutyerwh96filtersoz11x11.CONV2has256filtersoz 5x5,CONV3andCONV4have384filtersoz3x3,folwedbyCONV5with256filters ofsize3x3.Allpoingayers,POOL1,POOL2,andPOOL3,have3x3polingfters.Both FC6andFC7have4096neuroswithlaybgFC8with1000neuros,whics equaltoh1000outpclaseinhbd.Itisaverypoulchtnd oftenhirsCNNarchiteupldog-scaleimgrontkdy.

Visual Geometry Group ThisarchteufomSimonyadtheirc-authors[17]wastherun-upinthe ImageNetchalngi2014.Itisdegnothcrapwkb networks.Thoughteyprvidalfc,theyavnirlg numberofpats(~140M)anduselotmryhAlexNet. Group(VGG)hasmlerfitnAlexNet,wheracfiltsoz3x3butwiha lowerstidfn,whiceftvlyapursmd7x7filterwh fourstide.Ithastypicl16-19layersdpnigothcuVGGconfigurat. The

Visual Geometry

figurelstahc:

[ 113 ]

GoogLeNet WhileVGGwastherun-upintheImageNet2014chaleng,GoogLeNet, alsoknw inception,wastheingubmo.Ithas22layersintowhfuconetdlayr.Oneofthprimaycnbuswgld parmetsof5Mfrom60MinAlexNet.Thoughtenmbrfpasidc,it iscomputanlyrexvhfdwk.The figurelstahc:

ResNet ResNetiscurnlyha-of-the-artchieufolg-scaleimgront.Oneof themsincowprvuadk,the betrhpfomanc.Howevr,withncreasgdpofk,theproblmf vanishing gradientsisalompfednchyruvtg withrespcogadnfmvuly.Thelargthnumbofys,the smalerthgdinbco,evntualyishgo0.Toavoidthsprblem,ResNet introducesahg,wherinstadofcmpugv

,you

nowcmputehgradiv

,wher

istheorgnalpuwk.This

alevitshfco

getinsucvlymar.Theadvntgofhisry thanowyucredpksim150layers,whicasnot posiblefr.

[ 114 ]

Thefolwing detail:

figurelstahnwokcm

Summary Inthiscaper,wedscribthonpfavlukm detail.WeprovideasumzhtyfCNNandhowitrge.Wecoverd thebasicofCNNs,rangifometwkchu,layers,losfuncti,and regulaiztonchqs.Wealsoutinedprcmfh coneptsadliurhwmgfy TensorFlow.Wealsoutinedhwpr-trainedmolfcusp devlopmnt.Finaly,weilustradpoCNNarchiteusofnv intalchoefdvprsymuk.Innextchapr,weilokat howdeplarnigtcqusf .

[ 115 ]

NLP - Vector Representation Natural language processing(NLP)isonefthmpraclg learnig.Understanigcomplxuf TheaplictonsfNLParelmostvywh,aswecomunitlyhrg languedstorhmkwyi.Thisncludewb search,advertismn,emails,customervi,machinetrslo,andso.Someof theorsacinludgm(spechrognit,machinetrslo), word-senlarigdmbuto,reasonigvkwldb,acousti modeling,part-of-spechtagin,name-entiyrcog,sentimaly,chatbo, question/answerig,andothers.Eachoftesakrquidpng taskorheplicn,aswelfctivndmhrgo. Similartocpuevsn,extracingfusodmlp. Recently,deplarnigochsvbtf repsntaiolgfxd.Thischapterwldbomng NLP.Wewiltakbouhres-of-the-artembdingols:Word2Vec,Glove,and FastText.WewilshoanexmpftrWord2VecinTensorFlow,andhowt dovisualztn.WewilasotkbuhedfrncWord2Vec,Glove,and FastText,andhowtusemiplcxf.

Traditional NLP Extracinguseflomx-basedinformtyk.Forabsic aplicton,suchadomentlif,thecomnwayfurxisld bag of words(BoW),inwhctefrquyoads featuroinghcls.WewilbrefytakouBoWinthefolwgsc,as welasthf-idfaproch,whicsntedorflmpa documentialrps.

artificial intelligence(AI).

Bag of words BoWismanlyforctegzdu.Itisalouedncmptrv.Theidasto repsnthdocumabgfw,disreganthmo ofthewrdsqunc . Afterhpocsingfx,oftencaldh andBoWrepsntaiofchdumbl.

corpus,asetofvcbulryignd

Takethfolwingxsamp: 5IFRVJDLCSPXOGPYKVNQTPWFSUIFMB[ZEPH OFWFSKVNQPWFSUIFMB[ZEPHRVJDLMZ

Thecorpus(textsampl)thenformadicywks *%:

columnasthewrd

\  CSPXO   EPH   GPY   KVNQ   KVNQT   MB[Z   OFWFS   PWFS   RVJDL   RVJDLMZ   UIF  ^

Thesizofthvcabulry(V=10)isthenumbrofqwdcp.Sentecs wilbethnrpsdag10vector,wheracntyospdi thevocabulry.Thevaluofthisnrydmb corespndigwuthm . Inthiscae,theswoncilbda10-elemntvcors,likeso: 4FOUFODF 4FOUFODF

[ 117 ]

Eachelmntofvrpsubawdi corpus(textsnc).Therfo,inthefrsc,theris 0inthevcor), isno c ur e f aryvecto),

countfr forthe countsfr

EPH,for GPY,andso.Similary,forthescnd,ther forpsitn0, CSPXO,sowegt GPY,andsofrth.

countfrhe

CSPXO(atposin EPH(atposin1ofthe

Weighting the terms tf-idf Inmostlangue,somewrdtnapfhbuyci muchdiferntavogjslyw. Examplesrwoduch considerthawfquypv,wemightnobal

,

,and

,whicarelvyomnEnglish.Ifweonly

efctivlydrnabwsoumh documentshar. Oneaprochtklisbmd frequency(tf-idf).Likeitsnam,itakesnocuwrm: inverse document frequency(idf). Withf,

,thesimplcourawnfd;thais,

henumbroftisa longerdcumts,acomnwyistkehrfqudvbx frequncyoatmsihd :

oc ursind me t

Inthisequon, The comnraeilduts.Onecomnwaytdrihsk logftheinvrspdcuma :

term frequency and inverse document term frequency(tf)and

.Howevr,toprevnabiswd

isthenumbrofc(rawcount)inthedocum. measurhowcinftdpv;thais,whetrmis

Bymultipyngbohvaes,tf-idfscalute:

Atf-idfsotenurmavlxg.

[ 118 ]

Deep learning NLP Deeplarnigbsmutfvo naturlge.Inthisecon,weilcovrthmanfusgdp distrbuepnaof NLP, wordembingsavlthpf wordembings,andplictos.

Motivation and distributed representation Likeinmayothrcs,therpsnaiofd,whicsotenfrma encodashwtmilrg,isoftenhmprad fundametlprisog AI. Thefctivnsadlbyoh repsntaiolgydmfhcw modelanpict. Asmentiodhprvusc,tradionlNLPoftenus-hotencdig repsnthwodiafxvcbulyBoWtorepsndcum.Suchan aprochteswd,forexampl,house,road,tre,asntomicybl.Theonhotencdigwlrapsk[000000000010000].Thelngthof repsntaiohzfvcbuly.Withsucrepnaio,oneftdsup withugesparvco.Forexampl,inatypclseho,vocabulrysizen befrom20,000to500,000.Howevr,ithasnobvuprlem,whicstae relationshpbwyfdg,forexampl,motel[000000000010 000]andhotel[000000010000000]=0.Also,encodigsartulyb,for examplinostg, mayberpsntd entryofhlgspavci1.Suchrepsntaiovduflm thesymrgadincolxbwvu symbol .

and

Thismakethlrngofdcubw uchofwatislerndbm Therfo,discretIDsseparthculmnigofwd repsntaio.Althougsmeaiclnfrbd lev,informatheclvsxyd.Thiswherdtbuvco repsntaio,andeplrigtcu,comesthlp.

as

,meanigth453rd

tbealovrg whenitsprocgda

.

Deeplarnigothmsuvfc complexity/abstrcion.

[ 119 ]

TheramultipbnfswgodNLPproblems: Asoftendirclyvmhapb,improveth incompletsadvr-specifatonhd-craftedu.Handcraftig featursionvymcgdbp aginforechtskdm-specifroblm.Featureslndfomi oftenshwligrazbydm.Onthe contray,deplarnigsfomth acrosmultipev,inwhctelorvspdmga informathcbelvgdys-tunig. Learnigfetushomlyxcvbp eficnthars-neighbour-likeorcustng-likemods.Atomicsybl repsntaiodcuymlhbw . Withwordsbeingaply,NLPsytemcanbirdlfg. Thedistrbuonalphcmfv spacerovidthunyflwgNLPsytemodrcplx reasonigdkwlvt . Learnigcbedouspv.Giventhcursalofd,therisa greatndfouspvil.Itisoftenralcqub manyprctilse . Deeplarnigsmutvof.Thisonefthm importandvgesfl,forwhictelandms construedlv-by-levthrougcmpsin.Thelowrvfpsntai oftencabshrdk . Naturalyhndescivofmg.Humansetcr compsedfwranhitu.Deeplarnig, espcialyruntmod,isabletocpurhqnfm amuchbetrsn .

Word embeddings Thevryfundamtliosb-basedrpntiohw canberpstdyhmofig.AssaidbyJ.R.Firth1957:11:

[ 120 ]

ThisperhaonftmucldNLP.Thedfinto neighborscavytkuflxm orsemanticp.

Idea of word embeddings Firstofal,awordisepntvc.Awordembingcathus

g

mapingfuctorhewd

-dimensoalpc;thais, saprmetizdfuncogwlhdimensoalvctr(forexampl,vectorswih200to500dimenso).Youmaylso consider aslokuptbewihzf and isthezofdmn,andechrowspt.Forexampl,we mightfnd:

g ,

,in

whic

,wher

isthezofvcabulry

8 EPH   8 NBU  

Here,

8isoftenalzdhvrmcw,thenwlork 8inordetpfmsak.

learndupt Forexampl,wecantrihoklpd words)isvald.Let thisanpuwovelb(meanigvld).Wethenrplacsomfwdi thisencwradomf asnegtivchwlmory5-gramisnoecl:

-gram(sequncof saywegotnqucfrd

,ekatdn ,andthelbis

[ 121 ]

Asshowniteprcdgfu,wetrainhmodlbyfg

-gramthouge

lokupmatrix

Wandgetvcorhpsw.Thevctorsahnmbid througepn,andwecompritsulhgv.Aperfct predictonwulshfg : 3 8 B 8 EPH 8 CBSLT 8 BU 8 TUSBOHFST  3 8 B 8 DBU 8 CBSLT 8 BU 8 TUSBOHFST 

Thedifrncs/erosbtwnhagvludpicf 8and

3 (theagrtionfuc,forexampl,sum).

Thelarndwombigshvtp. First,thelocainfwrdpsg-dimensoalpctr bytheirmangs,suchtawordilemng:

Second,whicsevnmortg,isthawordveclnp.The relationshpbwdcugfmy pairofwds.Forexampl,staringfomhelcwd distancerobw

king,movethsa manand

woman,andoewilgthr

Researchfoundtiglm,thersulingvcoa reflctvysubmaniohp,suchaityndeorblg.For exampl,FranceistoParisaGermanyistoBerlin.

[ 122 ]

queen,thais:

Anotherxamplisfndwbg simlartobg.Onecansimplyout Manyotherkidsfmaclpbu,suchaopitend, compartive.SomenicxaplsbfoudMikolv

. spublicaton, (IUUQTBSYJWPSHQEGQEG),ashown

inthefolwgur:

Advantages of distributed representation TheramnydvtgsofuibwcNLPproblems.Withe subtlemanicrohpgd,therisgaponlmvy existngNLPaplictons,suchaminetrlo,informatevl,andquestio answerigytm.Someobviusadntgr: Capturingloca-ocurenstai Producesta-of-the-artlinesmcohp Efficentusoa Cantraio(comparbly)litedangc Fast,only-zerocuntsma Goodperfmancwithsl(100-300)dimensovctrhapf downstreamk

[ 123 ]

Problems of distributed representation Keepinmdthaorcslvyg,andsimlry,adistrbue repsntaiolvbu.Touseitproly,wendtoursamfi knowisue : Similarity and relatedness are not the same:Withgreavluions presntdiomublca,therisnoguafc practilon.Onerasonithcudvlf onthedgrfclaivsuwbym.It tharepsniofmdlcwuv,butdo notbsperfmacgivk.Thisperhacudbytf thamosevluind tdisnguhbewormlay relatdns.Forexampl, and aresiml,wheras and areltdbuism. Word ambiguity:Thisproblemcuwhndavtg.For exampl,theword hastemnigof meanigof wordasnevctihugmby.Someaprochsv benprosdtlamuifchw.Forexampl, Traskndtheirco-authorspedmlibng foreachwdbsnupvimgt( ).Onecanrftohspwi

spo ible

inadtohe .Inthisway,therisalmopng

IUUQTBSYJWPSHBCT

snecaryfotk.

Commonly used pre-trained word embeddings Thefolwingtabscmyudpr-trainedwombgs: Name

Year URL

Comments Amultingapre-trained

Word2Vec

2013

IUUQTDPEFHPPHMFDPN BSDIJWFQXPSEWFD

[ 124 ]

vectorisalb

IUUQT HJUIVCDPN,ZVCZPOH XPSEWFDUPST

DevelopdbyStanford,its claimedtobrhn GloVe

2014

Word2Vec.GloVeisntaly acount-basedmolth

IUUQOMQTUBOGPSEFEV QSPKFDUTHMPWF

combinesglatrx decomp sit na l contexwid .

FastText

2016

IUUQTHJUIVCDPN JDPYGPH GBTU5FYU+BQBOFTF5VUPSJBM

InFastText,theaomicuns gramchtes,andwor

-

vectorispndbyh agretionfh

-gram chartes.Learnigsqute fast . LexVecprfomsatizn ofthe

positive pointwise mutual information(PPMI) matrixusng window sampling and negative sampling (WSNS).Saleandothrs,in LexVec

2016

theirwokf

IUUQTHJUIVCDPN BMFYBOESFTMFYWFD

,sugethaLexVec matchesndofupr competingdlswr simlartyndecog task. ByYinadothers, ,2016.It Meta-Embedings 2016

IUUQDJTUFSODJTMNVEF NFUBFNC

combinesdfrtpul embdingstora betrvcos(metaembdings).

Inthefolwingsc,weilmanytkbouhrps:Word2Vec,GloVe, andFastText.Inparticul,weildvprntoWord2Vecforitseda,itswo distncmoel,theprocsfaing,andhowtlevrgpsuc-trained Word2Vecrepsntaio.

[ 125 ]

Word2Vec Word2Vecisagroupfentdvmlwb tex.Itmapswordtvec.Inthemapdvcors,thewordsacmn contexarldsh.Inthisecon,weildscuntaWord2Vec anditswopecfml.WewilasodecrbhtnWord2Vecusing TensorFlow.

Basic idea of Word2Vec Word2Vecmodelsnyhavtr;theinpulayr,theprojcinlay,andthe outplayer.Theratwomdlshci,namelyth Words(CBOW)modelanthSkip-Gramodel.Theyarvsimlbutdfnhow theinpulayrdocs.TheSkip-Gramodelhsctrg word(forexampl, )astheinpudrcox/surondigwathe ( tup o ).Ontheorand,CBOWstarfomucenxwd( ),doesagrtinfmuhly,andpreictsh targewod( ).Thefolwingurstahdc:

Continuous Bag of

[ 126 ]

TakethCBOWmodelasnxp.Eachwordinteagsphotencdvr. isalorepntdby-hotencdig repsnthwigmax(conectis)betwnhipulayrdos rowepsnthigcdvabuly.Thisweght matrixswhendlgbcuovf ofthewrdsinuvcably(asitrow). conetighdlayr,witheouplayr,whicsaloedtnxr matrix.Thisthemarxcoupwdv.IntheSkip-Gram model,theinpusargwodvc ,andechtryospwivbul.Forthesamgwod multipears( theransfomwk,givenaputo-hotencdwr lengthvcor)shouldaveigrnmbtywcp-hot encodvtrfhxw.

istheon-hotencdigfrxw.Thetargwod .Thehidnlayrs

nodes.Matrix

istheoupmarx(conectis)

,theoupisavcrflng , , ),(, ),and(

, )aregntdfoi.Thegoalisth,throug ,theprdicon(a

The word windows Remebrthafoscindwg,weknothardc berpsntdyicox,ormespcifaly,itsowrd.Therfo,wecanus windotermhaysug(beforandt)we togehrwiadnc,ashowniteflgur:

Inthiscae,thewindosz twordsayfmhegilbncup.Then weslidthnoagrucx .

dliketoarn

2.Inordetlanhgw

sits,nearbywodsupt

[ 127 ]

Generating training data IntheSkip-Gramodel,wegnratpisofdl:

Apositverangcbdflw:

Fromthesiluran,onecasilythwrkg fromthenubisacpg(target,contex)showup.Forexampl,themodl aysemorplf tesinga,ifyougvethwrd

,

or

,

rathen

,

.Therfo,athe

asinput,itwloupamchgerbyf .

Fromthewayringds,onecatihrdul networkilayhgdfsupcxv theargwod.Takethxamplsownircdgfu,themodlwinak intoacuhes , targewod contexwrdsaihvyfglp. Therfo,theprdicobalyfnxwgvu repsnthobailycuv.

quickisfurtheawyn fox,while

brown,and

jumpisafter.Theinformathsbld

[ 128 ]

quickisbeforth

Negative sampling Fromthelsfunci,wecansthompuigfxlyrbv. Thecros-entropycsfuiqhwkdbal,whicmeans theoupscrfmandblizg probailtesfch(forexampl,awordinthevcbulySkip-Gramodel). Thenormaliztquscpfhd-layeroutpwihvdn contex-wordmatix.Inordetalwihsu,Word2Vecuseathniqld negative sampling(NEG),whicsmlarto

noise-contrastive estimation(NCE).

ThecoridafNCEistocnveramulfpb,suchateof predictngawobshx,wheracodnbsil,into abinryclsftopem

versu

.

Duringta,thenworkisfd theorwdincx,andfewromlygt

whicsteargodp ,whic

consitfheargwd

selctdworfmhvabuy.

Thentworkishfcdgumba,whic ultimaeydsohrngfpbcx. Essentialy,NEGkepsthofmaxlyrbuwidnc.Thegoalist forcethmbdingawslx,and diferntomhbgswaycx . Thesamplingroctuf-of-windoctexparsgm specifalydgntrbuo,suchanigrmdtbo,whicanbeformlzd as:

InsomeiplntafWord2Vec,thefrquncyisadopw anempirclvu:

Inthisequaon, frequntwodsbam.Changithesmplrycv signfcatmpohelru .

,whics

isthewordfquncy.Selectingbyhsdruoafv

[ 129 ]

Thefolwingurstahypd contex,randomlysectfhvbu:

Hierarchical softmax Computinghesofaxvbcrwd,wehavtocmpu denomiatrbhlzpy.Howevr,thednomiarsuf theinrpoducbwlayv , embding ,

,andtheoup ,ofevrywdinthcabul,

.

Tosolvethiprbm,manydifertpochsvb.Somearsoftxbasedprochuilftmx,diferntasomx,andCNNsoftmax on,whileotrsampng-basedproch.Readerscnfto

IUUQSVEFSJP

XPSEFNCFEEJOHTTPGUNBYJOEFYIUNMDOOTPGUNBYforadepunstig

aproximtngsfuc. Softmax-basedprochmtkfxlyinu architeuompvsfny(forexampl,hieraclsoftmx). Sampling-basedproch,howevr,wilcompetyrvhsfaxnd insteadopmzwlygfucrxh.Forexampl, aproximtnghedscu,likeNEG.Agodexplanti canbefoudtYoavGoldberganOmerLevy'spaer, ,2014IUUQTBSYJWPSHBCT ( ).

[ 130 ]

Forhieaclsoftmx,themaindsobulHufmantrebsdow frequncis,wheracodislft.Thecomputainfhsxvlr aprticulwodshenmgfb nodesithr,fromthela,tharepsnwod.Ateachsubtrpli point,wecaluthprobiygnf.The sumoftheprbailygndcq;thisguarne sumofaltheprbindq.Withablncedr,thiscanredu thecompuainlxyfr to

,

,wher

Other hyperparameters Ontopfhenvlywagrims,suchaSkip-Gramodels(withnegav sampling),CBOW(witheraclsofmx),NCE,andGloVe,comparedtinl count-basedproch,theralsomnywpcig thacnbeudoimprvf.Forexampl,subampling,removinga words,usingdyamcotexw,usingcotexdrbmh,ading contexvrs,andmore.Eachoneftm,ifusedproly,wouldgreatyhpbs perfomanc,espcialynrtg.

Skip-Gram model WenowfcusaimprtdelhWord2Vec,theSkip-Gramodel.As describnthgof,theSkip-Gramodelprictshnxw giventhpuarwod.Thewordmbingatsfchx betwnhipulayrd.Next,wexplainthSkip-Gramodelin details .

Wel,wecan tfedaworiclysxnghuk.Instead,wend somethingacly.Suposewhavcbulryf10,000uniqewords;by usingoe-hotencdig,wecanrpsthodvflg10,000,withone entryasoihpcdgwlf,andzeroilfth posit n . TheinputofhSkip-Gramodelisngwrpt(one-hotencd)with lengthquaosizfvcbry, pairs.

,andoutpisermbyhg

[ 131 ]

isthevocabulryz.

Thehidnlayr,inthscae,doesnthavycifu.Theconti betwnhipulayrdcogfsmx, wher

, iswth

isthenumbrofdlay. forevywdinthcabul,and

rows,thaisone

columns,thaisonefrvydu.The

umbern whiconetsdlayrup,andthesimlryofw, withe

, ,

wilbethmdngvcor.Therisanothuxlym, contexwrd(out-of-windor)ismnzed.

Theoutplayrisfmxgnc.Ittakesvcorfbiyl-valued score (

)andsquheitovcrflbwzm:

Theoutpfrmhidnlay(thewordvctfinpuag,

)is wheracolumnfitpsd(let

multipedwhaxry as ume word, theprobailyfvngcxwd , Here aplyingthesofmxuc:

s

).Thedotprucsavlwhi,afternomliz,repsnt ,giventhpuarwod, sanilutrofcghepwd

. and

Thelosfunctihgavrmpx:

[ 132 ]

Remebrthaolsfdin togehrwiaulznm,

overaltingxmps .

Froman

information theorypointfvew,thisenalycro-entropylsfuci.

Wecanuderstihfolwgy. Thecros-entropyisdfa:

Witheaculdisrbon asdeltfuncio,itsenropym

.

Therfo:

Inthisequaon, Inourcase,

istheaculdrbon

isthemaonfdrbu. isentalyhofmxup,whics

,andwith wheronlyt

, thenryis

.Theprcdingquatobsmlf:

So,minzgthelosfucrqva-entropyvalu. Froma probabilistic interpretationpointfvew, (normalized)probailtysgnedhc.Wearectulyminzgh negativlokhdfcrs,thais,perfoming likelihood estimation(MLE).

canbeitrpdsh maximum

Continuous Bag-of-Words model Forthe Continuous Bag-of-Words(CBOW)model,theidasvnmorgfw, becausthmodlrnigxpw.Theinputsar stilbacyheurondgxwfz , agrethm(adingthero-hotencdig)first,thenipuoralwk. Theswordilthnbpcugmay,andtheoupisrg wordinthec .

[ 133 ]

;thedifrncsaw

Training a Word2Vec using TensorFlow Inthisecon,weilxpanst-by-stephowbuildanrSkip-Gramodel usingTensorFlow.Foradetiluonsc,pleasrfto

IUUQTXXX

UFOTPSGMPXPSHUVUPSJBMTXPSEWFD:

1.Wecandowlthesfrm 2.Wereadinthcoflsw. 3.WesetuphTensorFlowgraph.Wecreatplhodsfinuw theconxwrds,whicarepsntdgovbuly:

IUUQNBUUNBIPOFZOFUEDUFYU[JQ.

USBJO@JOQVUTUGQMBDFIPMEFS UGJOUTIBQF USBJO@MBCFMTUGQMBDFIPMEFS UGJOUTIBQF

CBUDI@TJ[Frefstohizbac.We

Notehawrinbcs,so alsocretnhdvi, WBMJE@FYBNQMFTisanryoftegdchvbul

herw validton:

WBMJE@EBUBTFUUGDPOTUBOU WBMJE@FYBNQMFTEUZQFUGJOU

Weperfomvalidtnbycughsw thevalidonscbury,andfitheworsvcbuly aremostilhwdnv . 4.Wesetuphmbdingarxvl: FNCFEEJOHTUG7BSJBCMF

UGSBOEPN@VOJGPSN   FNCFEUGOOFNCFEEJOH@MPPLVQ FNCFEEJOHTUSBJO@JOQVUT

[ 134 ]

5.Wecreathwigsndbolyup softmaxlyer.Thewightsvarblmxofz FNCFEEJOH@TJ[F,wher FNCFEEJOH@TJ[Fisthezofdnlayr.Thesizofth isthezofuplayr:

WPDBCVMBSZ@TJ[F WPDBCVMBSZ@TJ[Fisthezofuplayrnd CJBTFTvarible

XFJHIUTUG7BSJBCMF

UGUSVODBUFE@OPSNBM  TUEEFWNBUITRSU FNCFEEJOH@TJ[F CJBTFTUG7BSJBCMF UG[FSPT IJEEFO@PVUUGNBUNVM FNCFEUGUSBOTQPTF XFJHIUT  CJBTFT

IJEEFO@PVUandusecro-entropylsimzh

Now,weaplysoftmxh weights,biase,andembig.Inthefolwingcd,wealsopcifygrdnt

:

optimzerwhlangf

USBJO@POF@IPUUGPOF@IPU USBJO@DPOUFYUWPDBCVMBSZ@TJ[F DSPTT@FOUSPQZUGSFEVDF@NFBO

UGOOTPGUNBY@DSPTT@FOUSPQZ@XJUI@MPHJUT MPHJUTIJEEFO@PVU MBCFMTUSBJO@POF@IPU PQUJNJ[FS UGUSBJO(SBEJFOU%FTDFOU0QUJNJ[FS  NJOJNJ[F DSPTT@FOUSPQZ

[email protected]

Foreficny,wecanhgtlosfuirm loswarignypedMichaelGutmandheirco-authorspe,

: ODF@MPTTUGSFEVDF@NFBO

UGOOODF@MPTT XFJHIUTXFJHIUT CJBTFTCJBTFT MBCFMTUSBJO@DPOUFYU JOQVUTFNCFE OVN@TBNQMFEOVN@TBNQMFE OVN@DMBTTFTWPDBCVMBSZ@TJ[F PQUJNJ[FS UGUSBJO(SBEJFOU%FTDFOU0QUJNJ[FS  NJOJNJ[F ODF@MPTT

Forvalidton,wecomputsinlarybdghv setandhwormbigvculy.Later,weilprntho vocabulrythesmdingw.Thecosinmlarty betwnmdig

wordsinthe and

isdef n a :

[ 135 ]

Thistranleohfwgcd: OPSNUGTRSU UGSFEVDF@TVN UGTRVBSF FNCFEEJOHT  LFFQ@EJNT5SVF OPSNBMJ[FE@FNCFEEJOHTFNCFEEJOHTOPSN WBMJE@FNCFEEJOHTUGOOFNCFEEJOH@MPPLVQ

OPSNBMJ[FE@FNCFEEJOHTWBMJE@EBUBTFU TJNJMBSJUZUGNBUNVM

WBMJE@FNCFEEJOHTOPSNBMJ[FE@FNCFEEJOHTUSBOTQPTF@C5SVF

6.NoweardytounhTensorFlowgraph: XJUIUG4FTTJPO HSBQIHSBQI BTTFTTJPO 8FNVTUJOJUJBMJ[FBMMWBSJBCMFTCFGPSFXFVTFUIFN JOJUSVO QSJOU *OJUJBMJ[FE BWFSBHF@MPTT GPSTUFQJOSBOHF OVN@TUFQT  5IJTJTZPVSHFOFSBUF@CBUDIGVODUJPOUIBUHFOFSBUFTJOQVU XPSETBOEDPOUFYUXPSET MBCFMT JOBCBUDIGSPNEBUB CBUDI@JOQVUTCBUDI@DPOUFYUHFOFSBUF@CBUDI EBUB CBUDI@TJ[FOVN@TLJQTTLJQ@XJOEPX GFFE@EJDU\USBJO@JOQVUTCBUDI@JOQVUT USBJO@DPOUFYUCBUDI@DPOUFYU^ 8FQFSGPSNPOFVQEBUFTUFQCZFWBMVBUJOHUIFPQUJNJ[FSPQ BOEJODMVEFJUJOUIFMJTUPGSFUVSOFEWBMVFTGPS TFTTJPOSVO @MPTT@WBMTFTTJPOSVO

GFFE@EJDUGFFE@EJDU BWFSBHF@MPTT MPTT@WBM JGTUFQ JGTUFQ  BWFSBHF@MPTT 5IFBWFSBHFMPTTJTBOFTUJNBUFPGUIFMPTTPWFS UIFMBTUCBUDIFT QSJOU "WFSBHFMPTTBUTUFQ TUFQ  BWFSBHF@MPTT BWFSBHF@MPTT GJOBM@FNCFEEJOHTOPSNBMJ[FE@FNCFEEJOHTFWBM

[ 136 ]

7. Inaditon,weantopriuhdsml validtonwrs earli,andsortigheul.Notehaisnxpvor,sowed itonlycevr10,000step: wedothisbycalngmrpf

JGTUFQ TJNTJNJMBSJUZFWBM GPSJJOSBOHF WBMJE@TJ[F  SFWFSTF@EJDUJPOBSZNBQTDPEFT JOUFHFST UPXPSET TUSJOHT WBMJE@XPSESFWFSTF@EJDUJPOBSZ UPQ@LOVNCFSPGOFBSFTUOFJHICPST OFBSFTU TJN BSHTPSU MPH@TUS /FBSFTUUPT WBMJE@XPSE GPSLJOSBOHF UPQ@L  DMPTF@XPSESFWFSTF@EJDUJPOBSZ MPH@TUS TT  MPH@TUSDMPTF@XPSE QSJOU MPH@TUS

Itisntergochaf,theopignarswd are MBOUIBOJEFT,EVOBOU,KBH,XIFFMCBTF,UPSTP,CBZFTJBO,IPQJOH,and TFSFOB,retfaub 30,000step,theopignarswd are TJY,OJOF,[FSP,UXP,TFWFO,FJHIU, UISFF,and GJWF.Wecanuset-SNEbyMaatendHinto,from (2008) IUUQXXXKNMSPSHQBQFSTWPMVNFWBOEFSNBBUFOBWBOEFSNBBUFOB QEG,tovisualzehmbdngfwc: GSPNTLMFBSONBOJGPMEJNQPSU54/& JNQPSUNBUQMPUMJCQZQMPUBTQMU UTOF54/& QFSQMFYJUZO@DPNQPOFOUT JOJU QDB O@JUFSNFUIPE FYBDU QMPU@POMZ MPX@EJN@FNCTUTOFGJU@USBOTGPSN GJOBM@FNCFEEJOHT SFWFSTF@EJDUJPOBSZNBQTDPEFT JOUFHFST UPXPSET TUSJOHT MBCFMT QMUGJHVSF GJHTJ[F  JOJODIFT GPSJMBCFMJOFOVNFSBUF MBCFMT  YZMPX@EJN@FNCT QMUTDBUUFS YZ QMUBOOPUBUF MBCFM YZ YZ  YZUFYU   UFYUDPPSET PGGTFUQPJOUT  IB SJHIU  WB CPUUPN

[ 137 ]

Inthefolwingur,wevisualzthWord2Vecembding,andfithwors simlarengcoth:

[ 138 ]

Using existing pre-trained Word2Vec embeddings Inthisecon,weilbgonthrufpcs: Word2VecfromGoogleNews Usingthepr-trainedWord2Vecembdings

TheWord2VecmodeltrainbyGooglenthGoogleNewsdatehfur dimensof300.Thenumbrofatsicdhypw can,andperhsoul,exprimntwhyoualcsg yieldsthbru . Inthispreandmol,sometpwrducha others uc a forexampl,both

, ,

,and and

Youcanfidopesrtlh

,and

arebingxclud,but

areinclud.Someispldworancu, thelariscon. IUUQTHJUIVCDPNDISJTKNDDPSNJDLJOTQFDU@

XPSEWFDtoinspechwrdmbg-trainedmol.

Inthisecon,weilbrfyxpanhotus-trainedvcos.Beforeadingths section,downlaWord2Vecpre-trainedvcosfm

IUUQTESJWFHPPHMFDPNGJMF

E#9L$XQ*,%:/M/655M44Q2N.FEJU,andlothem: GSPNHFOTJNNPEFMTJNQPSU,FZFE7FDUPST -PBEQSFUSBJOFENPEFM NPEFM,FZFE7FDUPSTMPBE@XPSEWFD@GPSNBU

 (PPHMF/FXTWFDUPSTOFHBUJWFCJO CJOBSZ5SVF

Then,wefindthop

wordsthaeiml

XPNBOand

LJOH,butdismlaro

NPEFMXWNPTU@TJNJMBS

QPTJUJWF< XPNBO  LJOH >OFHBUJWF< NBO >UPQO

Wesethfolwing: < V RVFFO    V NPOBSDI    V QSJODFTT    V DSPXO@QSJODF    V QSJODF  >

[ 139 ]

NBO:

Thismaken,since sharetibuw

RVFFOshareimltbuo NBO.

XPNBOand

LJOH,butdoesn

Understanding GloVe GloVeisanuprvdlgothmfbc (embedings)forwds.Ithasbentwimlrgyp,the embdingsratuhwopfvyl downstreamNLPtask. ThedifrncsathWord2Vecisapredtvmol,whiclearnstmbdgo improvethdcablynzgs,thais,thelos(target ;word|cntexs networkaduspimzchSGDtoupdaehnwrk.

).InWord2Vec,it

sbenformalizd-forwadneul

Ontheorand,GloVeisntalycou-basedmol,wheraco-ocurenmatix isfrtbul.Eachentryiso-ocurenspdthfqywag word(rows),athesmi,thawesconxrd(thecolumns).Then,this matrixsfcozedylw-dimensoal(wordxfeatus)matrix,wheraco nowyieldsavctrpfh.Thiswhertdmnouc flavorcmesin,asthegolimnzrcudfwdimensoalrptxfhvcg-dimensoalt. TherasombnfithugGloVeovrWord2Vec,asiteroplzh implentaosuchdbr. Theramnysoucli.ForimplentaousgTensorFlow,onecalk ta

IUUQTHJUIVCDPN(SBEZ4JNPOUFOTPSGMPXHMPWFCMPCNBTUFS(FUUJOH4UBSUFE JQZOC

[ 140 ]

FastText FastText( IUUQTGBTUUFYUDD)isalbryfoecntgwdp andsetclifo.ThemaindvtgofFastTextmbdingsovrWord2Vecis toakeincuhrlsfwdgp , whicouldbevrysfmpgan,andlsofrwth ocuraely . ThemaindfrcbtwWord2VecandFastTextishaforWord2Vec,theaomic entiysachword,whicstemalunor.Ontheconray,inFastText,the smaletunichr-lev charte -grams.Forexampl,thewordvcf threandmxiusz,canbedompst:

-grams,andechworistbgmpf withan

-gramofinusze