Get to grips with the essentials of deep learning by leveraging the power of Python Key FeaturesYour one-stop solution t
2,597 506 15MB
English Pages 284 [271] Year 2018
Table of contents :
Cover
Title Page
Copyright and Credits
Packt Upsell
Contributors
Table of Contents
Preface
Chapter 1: Why Deep Learning?
What is AI and deep learning?
The history and rise of deep learning
Why deep learning?
Advantages over traditional shallow methods
Impact of deep learning
The motivation of deep architecture
The neural viewpoint
The representation viewpoint
Distributed feature representation
Hierarchical feature representation
Applications
Lucrative applications
Success stories
Deep learning for business
Future potential and challenges
Summary
Chapter 2: Getting Yourself Ready for Deep Learning
Basics of linear algebra
Data representation
Data operations
Matrix properties
Deep learning with GPU
Deep learning hardware guide
CPU cores
RAM size
Hard drive
Cooling systems
Deep learning software frameworks
TensorFlow – a deep learning library
Caffe
MXNet
Torch
Theano
Microsoft Cognitive Toolkit
Keras
Framework comparison
Setting up deep learning on AWS
Setup from scratch
Setup using Docker
Summary
Chapter 3: Getting Started with Neural Networks
Multilayer perceptrons
The input layer
The output layer
Hidden layers
Activation functions
Sigmoid or logistic function
Tanh or hyperbolic tangent function
ReLU
Leaky ReLU and maxout
Softmax
Choosing the right activation function
How a network learns
Weight initialization
Forward propagation
Backpropagation
Calculating errors
Backpropagation
Updating the network
Automatic differentiation
Vanishing and exploding gradients
Optimization algorithms
Regularization
Deep learning models
Convolutional Neural Networks
Convolution
Pooling/subsampling
Fully connected layer
Overall
Restricted Boltzmann Machines
Energy function
Encoding and decoding
Contrastive divergence (CD-k)
Stacked/continuous RBM
RBM versus Boltzmann Machines
Recurrent neural networks (RNN/LSTM)
Cells in RNN and unrolling
Backpropagation through time
Vanishing gradient and LTSM
Cells and gates in LTSM
Step 1 – The forget gate
Step 2 – Updating memory/cell state
Step 3 – The output gate
Practical examples
TensorFlow setup and key concepts
Handwritten digits recognition
Summary
Chapter 4: Deep Learning in Computer Vision
Origins of CNNs
Convolutional Neural Networks
Data transformations
Input preprocessing
Data augmentation
Network layers
Convolution layer
Pooling or subsampling layer
Fully connected or dense layer
Network initialization
Regularization
Loss functions
Model visualization
Handwritten digit classification example
Fine-tuning CNNs
Popular CNN architectures
AlexNet
Visual Geometry Group
GoogLeNet
ResNet
Summary
Chapter 5: NLP - Vector Representation
Traditional NLP
Bag of words
Weighting the terms tf-idf
Deep learning NLP
Motivation and distributed representation
Word embeddings
Idea of word embeddings
Advantages of distributed representation
Problems of distributed representation
Commonly used pre-trained word embeddings
Word2Vec
Basic idea of Word2Vec
The word windows
Generating training data
Negative sampling
Hierarchical softmax
Other hyperparameters
Skip-Gram model
The input layer
The hidden layer
The output layer
The loss function
Continuous Bag-of-Words model
Training a Word2Vec using TensorFlow
Using existing pre-trained Word2Vec embeddings
Word2Vec from Google News
Using the pre-trained Word2Vec embeddings
Understanding GloVe
FastText
Applications
Example use cases
Fine-tuning
Summary
Chapter 6: Advanced Natural Language Processing
Deep learning for text
Limitations of neural networks
Recurrent neural networks
RNN architectures
Basic RNN model
Training RNN is tough
Long short-term memory network
LSTM implementation with tensorflow
Applications
Language modeling
Sequence tagging
Machine translation
Seq2Seq inference
Chatbots
Summary
Chapter 7: Multimodality
What is multimodality learning?
Challenges of multimodality learning
Representation
Translation
Alignment
Fusion
Co-learning
Image captioning
Show and tell
Encoder
Decoder
Training
Testing/inference
Beam Search
Other types of approaches
Datasets
Evaluation
BLEU
ROUGE
METEOR
CIDEr
SPICE
Rank position
Attention models
Attention in NLP
Attention in computer vision
The difference between hard attention and soft attention
Visual question answering
Multi-source based self-driving
Summary
Chapter 8: Deep Reinforcement Learning
What is reinforcement learning (RL)?
Problem setup
Value learning-based algorithms
Policy search-based algorithms
Actor-critic-based algorithms
Deep reinforcement learning
Deep Q-network (DQN)
Experience replay
Target network
Reward clipping
Double-DQN
Prioritized experience delay
Dueling DQN
Implementing reinforcement learning
Simple reinforcement learning example
Reinforcement learning with Q-learning example
Summary
Chapter 9: Deep Learning Hacks
Massaging your data
Data cleaning
Data augmentation
Data normalization
Tricks in training
Weight initialization
All-zero
Random initialization
ReLU initialization
Xavier initialization
Optimization
Learning rate
Mini-batch
Clip gradients
Choosing the loss function
Multi-class classification
Multi-class multi-label classification
Regression
Others
Preventing overfitting
Batch normalization
Dropout
Early stopping
Fine-tuning
Fine-tuning
When to use fine-tuning
When not to use fine-tuning
Tricks and techniques
Model compression
Summary
Chapter 10: Deep Learning Trends
Recent models for deep learning
Generative Adversarial Networks
Capsule networks
Novel applications
Genomics
Predictive medicine
Clinical imaging
Lip reading
Visual reasoning
Code synthesis
Summary
Other Books You May Enjoy
Index
Deep Learning Essentials
:PVSIBOETPOHVJEFUPUIFGVOEBNFOUBMTPGEFFQMFBSOJOH BOEOFVSBMOFUXPSLNPEFMJOH
Wei Di Anurag Bhardwaj Jianing Wei
BIRMINGHAM - MUMBAI
Deep Learning Essentials Cothgiryp
2018PacktPublishng
Allrightsevd.Nopartofhisbkmyeduc,storedinavlym,ortansmiedyf orbyanmes,withoueprnmsfbl,excptinhasofbrqu embdincrtalsovw . Everyfothasbnmdipkuc. Howevr,theinformacdsbklwuy,eithrxpsomld.Neither authors,norPacktPublishngortdeau,wilbehdafornymgscut havebncusdirtlyok . PacktPublishngaedvortpmkfuc mentiodhsbkyaprufcl.Howevr,PacktPublishngcaoturey ofthisnrma . Commissioning Editor:VeenaPagare Acquisition Editor:AmanSingh Content Development Editor:SnehalKolte Technical Editor:SayliNikalje Copy Editor:SafisEditng Project Coordinator:ManthaPatel Proofreader:SafisEditng Indexer:FrancyPuthiry Graphics:TaniaData Production Coordinator:ArvindkumarGupta Firstpublihed:January2018 Productinref:1250118 PublishedyPacktPublishngLtd. LiveryPlace 35LiveryStret Birmingha B32PB,UK. ISBN978-1-78588-036-0
XXXQBDLUQVCDPN
NBQUJP
Maptisanoledgbryhvufc5,000boksandvie,as welasindutrygohpvmc yourcae.Formoeinfat,pleasvitourwb.
Why subscribe? SpendlstimargocwhpBoksandVideos fromve4,000industrypofeal ImproveyulanigwthSkilPlansbuiltepcyfor GetafreBokrvideymnth Maptisfulyearchb Copyandste,print,andbokmrcte
PacktPub.com DidyouknwthaPacktofersBokversinfybpulhd,withPDFand ePubfilesav?YoucanpgrdetohBokversinat printbokcusme,youarentildschBokcpy.Getinouchws ta TFSWJDF!QBDLUQVCDPNformedtails. At XXX1BDLU1VCDPN,youcanlsredtifh,signupfora rangeofwslt,andrecivxlusotfPacktbosand eBo ks.
XXX1BDLU1VCDPNands
Contributors About the authors Wei Diisadtcenwhmyr,exprincmahlgdtf intelgc.Sheispaontbucrgmdlh canimptlosfdvuewrb.Curently,she worksatfdcienLinkedIn.ShewasprvioulyctdhBay HumanLanguaeTechnolgyteamdBayResearchLabs.Priortha,shewait Ancestry.com,workinglae-scaledtminghrofk.She recivdhPhDfromPurdueUniverstyn2011.
Anurag BhardwajcurentlyadshifoWiserSolutions,wher focusentrigal-comerinvty.Heispartculyned usingmachelrtovpbdyf matching,aswelvrioutdpbmn-comer.Previously,heworkdn imageundrstBayResearchLabs.AnuragecivdhsPhDandmster'sfrom theStateUniverstyofNewYorkatBufalondhsBTechinomputerg fromtheNationalInstiueofTechnolgy,Kuruksheta,India.
Jianing WeiisaenorftwgGoogleResearch.Heworksintheaf computervisnadlg.PriortjngGooglein2013,heworkdat SonyUSResearchCenterfo4yearsinthfldo3Dcomputervisnadg procesing.JianigobtedhsPhDinelctradompugfPurdue Universtyn2010.
About the reviewer Amita KapoorisAssociateProfesrinthDepartmenofElectronis,SRCASW,Universty ofDelhi.Shedibothrmas'sandPhDinelctros.DuringhePhD,shewa awrdethpsigouDAADfelowshipturackn KarlsuheInstiueofTechnolgy,Germany.ShewonthBestPresntaioAwardthe 2008InternaiolConfercoPhotnicsfrhepa.Sheisambrofpnl bodiesnclugthOpticalSocietyofAmerica,theInternaiolNeuralNetworkSociety, theIndianSocietyforBudhistStudies,andtheIEEE.
Packt is searching for authors like you Ifyou'reintsdbcomgauhfPackt,pleasvit aplytod.Wehaveworkditusnflpc,just likeyou,tohelpmsaringwbcuy.Youcanmke genralpicto,aplyforsecihtwung,or submityorwndea .
BVUIPSTQBDLUQVCDPNand
Table of Contents Preface Chapter 1: Why Deep Learning? What is AI and deep learning? The history and rise of deep learning Why deep learning? Advantages over traditional shallow methods Impact of deep learning The motivation of deep architecture The neural viewpoint The representation viewpoint Distributed feature representation Hierarchical feature representation
Applications Lucrative applications Success stories Deep learning for business Future potential and challenges Summary
Chapter 2: Getting Yourself Ready for Deep Learning Basics of linear algebra Data representation Data operations Matrix properties Deep learning with GPU Deep learning hardware guide CPU cores RAM size Hard drive Cooling systems
Deep learning software frameworks TensorFlow – a deep learning library Caffe MXNet Torch
1 7 8 10 16 16 18 20 21 22 23 25 26 27 27 34 35 37 38 39 39 40 41 43 44 45 45 46 46 47 47 48 48 49
Theano Microsoft Cognitive Toolkit Keras Framework comparison Setting up deep learning on AWS Setup from scratch Setup using Docker Summary
Chapter 3: Getting Started with Neural Networks Multilayer perceptrons The input layer The output layer Hidden layers Activation functions Sigmoid or logistic function Tanh or hyperbolic tangent function ReLU Leaky ReLU and maxout Softmax Choosing the right activation function
How a network learns Weight initialization Forward propagation Backpropagation Calculating errors Backpropagation Updating the network Automatic differentiation
Vanishing and exploding gradients Optimization algorithms Regularization Deep learning models Convolutional Neural Networks Convolution Pooling/subsampling Fully connected layer Overall
Restricted Boltzmann Machines Energy function Encoding and decoding Contrastive divergence (CD-k) Stacked/continuous RBM
[ ii ]
49 50 50 51 52 52 56 58 59 60 61 61 61 61 63 63 64 64 65 65 65 65 66 66 67 67 68 68 69 70 70 71 71 72 74 75 75 76 77 77 80 81
RBM versus Boltzmann Machines
Recurrent neural networks (RNN/LSTM) Cells in RNN and unrolling Backpropagation through time Vanishing gradient and LTSM Cells and gates in LTSM Step 1 – The forget gate Step 2 – Updating memory/cell state Step 3 – The output gate
Practical examples TensorFlow setup and key concepts Handwritten digits recognition Summary
Chapter 4: Deep Learning in Computer Vision Origins of CNNs Convolutional Neural Networks Data transformations Input preprocessing Data augmentation
Network layers Convolution layer Pooling or subsampling layer Fully connected or dense layer
Network initialization Regularization Loss functions Model visualization Handwritten digit classification example Fine-tuning CNNs Popular CNN architectures AlexNet Visual Geometry Group GoogLeNet ResNet Summary
Chapter 5: NLP - Vector Representation Traditional NLP Bag of words Weighting the terms tf-idf Deep learning NLP
81 81 82 82 83 84 85 85 85 85 86 86 90 91 91 93 95 96 97 98 99 100 101 102 103 105 106 108 111 112 113 113 114 114 115 116 116 117 118 119
[ iii ]
Motivation and distributed representation Word embeddings Idea of word embeddings Advantages of distributed representation Problems of distributed representation Commonly used pre-trained word embeddings
Word2Vec Basic idea of Word2Vec The word windows Generating training data Negative sampling Hierarchical softmax Other hyperparameters Skip-Gram model The input layer The hidden layer The output layer The loss function Continuous Bag-of-Words model Training a Word2Vec using TensorFlow Using existing pre-trained Word2Vec embeddings Word2Vec from Google News Using the pre-trained Word2Vec embeddings
Understanding GloVe FastText Applications Example use cases Fine-tuning Summary
Chapter 6: Advanced Natural Language Processing Deep learning for text Limitations of neural networks Recurrent neural networks RNN architectures Basic RNN model Training RNN is tough Long short-term memory network LSTM implementation with tensorflow Applications Language modeling Sequence tagging Machine translation
[ iv ]
119 120 121 123 124 124 126 126 127 128 129 130 131 131 131 132 132 132 133 134 139 139 139 140 141 142 142 142 142 143 144 144 146 147 148 149 151 153 156 156 158 160
Seq2Seq inference
163 165 165
Chatbots Summary
Chapter 7: Multimodality
166
What is multimodality learning? Challenges of multimodality learning Representation Translation Alignment Fusion Co-learning Image captioning Show and tell
Visual question answering Multi-source based self-driving Summary
166 167 167 168 168 169 169 170 171 172 173 173 173 174 175 176 180 181 182 182 183 183 183 183 184 187 189 190 193 196
Chapter 8: Deep Reinforcement Learning
197
Encoder Decoder Training Testing/inference Beam Search
Other types of approaches Datasets Evaluation BLEU ROUGE METEOR CIDEr SPICE Rank position
Attention models Attention in NLP Attention in computer vision The difference between hard attention and soft attention
What is reinforcement learning (RL)? Problem setup Value learning-based algorithms Policy search-based algorithms Actor-critic-based algorithms Deep reinforcement learning
[v]
198 198 199 201 202 203
Deep Q-network (DQN) Experience replay Target network Reward clipping
Double-DQN Prioritized experience delay
Dueling DQN Implementing reinforcement learning Simple reinforcement learning example Reinforcement learning with Q-learning example Summary
204 205 205 206 206 207 208 209 209 211 213
Chapter 9: Deep Learning Hacks
214
Massaging your data Data cleaning Data augmentation Data normalization Tricks in training Weight initialization
214 214 215 215 216 216 216 216 217 217 218 218 220 220 220 220 221 221 221 221 222 222 223 223 223 223 224 224 225 230
All-zero Random initialization ReLU initialization Xavier initialization
Optimization Learning rate Mini-batch Clip gradients
Choosing the loss function Multi-class classification Multi-class multi-label classification Regression Others
Preventing overfitting Batch normalization Dropout Early stopping Fine-tuning
Fine-tuning When to use fine-tuning When not to use fine-tuning Tricks and techniques Model compression Summary
[ vi ]
Chapter 10: Deep Learning Trends
231
Recent models for deep learning Generative Adversarial Networks Capsule networks Novel applications Genomics Predictive medicine Clinical imaging Lip reading Visual reasoning Code synthesis Summary
Other Books You May Enjoy Index
231 231 233 234 234 238 239 240 242 244 247 248 251
[ vii ]
Preface Deeplarnigsthmoduvcw,havingjumpedotfrsc labortiesghnpducvm.Itisthecnadrofwkg througidenlaysfp.Deeplarnigscutyofh bestprovidflunmagc,spechrognit,object recognit,and We lstarofbyuhingpmcedqk deplarnigtsmo.Movingo,we ofneuraltwksdhipc.Withelpofinsgu exampls,you otherimpancsudlf.
natural language processing(NLP). lteachyoubdifrnps learntocgizpsudwk
Usingtherfocmlaquwdp,you outperfmanyhdlswkiLSTMnetwork.Duringthecousf bok,youwilcmearsdngftkb,sucha TensorFlow,Python,Nvida,andothers.Bythendofbk,you productin-readyplnigfmwokucts.
lbuidAIthacn
lbeatodpy
Who this book is for Ifyouarenspigdtc,deplarnigthus,orAIresachlokingt buildthepowrfangysc,thenisbokprfc resoucfytadingAIchalengs. Togethmosufibk,youmsthaveinrdPythonskiladbefmr withmacnelrgopsdv.
What this book covers $IBQUFS,
historyfdeplang,itsre,anditsrecvfl.Wewilaso describomfthalng,aswelitfurpon.
,providesanwflg.Webeginwth
$IBQUFS,
,isatrngpoelfu
exprimntgwhadlycquso.Wewil answerthkyquioldcpg learnig.Wewilcoversmbanptfg,theardwquimns fordeplanigmt,aswelomfitpurk.We wilasotkengupdrymfch-based GPUinstace. $IBQUFS, ,focusenthbairlwk, includgpt/outplayers,hidenlayrs,andhowetrkslugf backprogtin.Wewilstarhendmuypcok buildngocks,andilustrehowypb.Wewilasontrducef po ularst nd m e ,sucha Convolutional Neural Networks(CNNs),Restricted Boltzmann Machines(RBM),recurrent neural networks(RNNs),aswelvritonf themiscald Long Short-Term Memory(LSTM). $IBQUFS,
,explainsCNNsinmoredtal.Wewilgover
thecornpsailwkgfCNNsandhowteycbus solvera-worldcmputevisnb.WewilasoktmefhpurCNN architeusndmplbCNNusingTensorFlow. $IBQUFS,
,coversthbaifNLPfordeplanig.This
chapterwildsboumngqf repsntaioNLP.ItwilasocverpumdhWord2Vec,Glove,and FastText.ThischapterlonudxmfbgTensorFlow. $IBQUFS,
,takesmordl-centriapoh
texprocsing.Wewilgoversmfthcd,suchaRNNsandLSTM networks.WewilmpentasLSTMusingTensorFlowandescribth foundatilrchebmysxpg LSTM. $IBQUFS,
,introducesmfalpghy
usingdeplar.Thischapterlomnv,advncemultio aplictonsfderg . $IBQUFS,
,coversthbaifnmlg.It
ilustraehowdpngcbmvf. Thischaptergoubmlnfd usingTensorFlowandiscumeftpr.
[2]
$IBQUFS,
,empowrsadbyvingctlh
canbeplidwhusgr,suchatebprifonwkg intalzo learnigpmtu,howtprevnfig,andhowtpre , yourdatfbelnigwhcs . $IBQUFS,
,sumarizeofthpcngd
learnig.Itloksatmefhupcingrdwyv,aswel somefthnwaplicdrg .
To get the most out of this book Theracouplfthingsydmbk.Firstly,its recomndtalshvbikwgfPythonprgamidce learnig . Secondly,beforpcdingt besurtoflwhpinc
$IBQUFS, $IBQUFS,
.Youwilasobetpyrnvmgc thegivnxampls. Thirdly,familrzedyouswthTensorFlowandreitscum.The TensorFlowdcumentai( informatdlscgexp.Youcanlso lokarundie,asthervioupncxmld-learnig-relatd resou c .
IUUQTXXXUFOTPSGMPXPSHBQJ@EPDT)isagretoucf
Fourthly,makesuryoxplnw.Trydiferntsgocua simplerobthadn'trequimchopanl;thiscanelpyouqk getsomidafhwlrknup . Lastly,diveprntoachyfml.Thisbokexplanthgfvrud learnigmodspwhvtuc;thegoalispyu understahmcioflwk.Whilethracuny manydifertolspubcvhg-levAPIs,agodunersti ofdeplanigwrtyhubmvc .
[3]
andothers,
Download the example code files Youcandowlthexmpfisrbky XXXQBDLUQVCDPN.Ifyouprchasedtibklw,youcanvist XXXQBDLUQVCDPNTVQQPSUandregistohvflmcyu.
Youcandowlthefisbygp: 1.Loginoresta 2.Selecth 3.Clickon 4.Enterhnamofbki instruco.
XXXQBDLUQVCDPN. SUPPORT. bat Code Downloads & Errata. Searchboxandflwthescr
Oncethfilsdowna,pleasmkurthyonzixcfdg latesvrionf: WinRAR/7-ZipforWindows Zipeg/iZip/UnRarXforMac 7-Zip/PeaZipforLinux ThecodbunlfrthkisaGitHubat
IUUQTHJUIVCDPN 1BDLU1VCMJTIJOH%FFQ-FBSOJOH&TTFOUJBMT.Wealsohvetrcdbunfm IUUQTHJUIVCDPN1BDLU1VCMJTIJOH. richatlogfbksndve
Checkthmou!
Download the color images WealsoprvidePDFfilethascormgn/diagrmsuenth bok.Youcandowlither:
IUUQTXXXQBDLUQVCDPNTJUFTEFGBVMUGJMFT EPXOMPBET%FFQ-FBSOJOH&TTFOUJBMT@$PMPS*NBHFTQEG.
[4]
Conventions used Theranumboftxcvisdhgk. $PEF*O5FYU:Indicatesowrnx,datbselnm,foldernams,filenams, filextnso,pathnmes,dumyURLs,userinpt,andTwiterhandls.Hereisan BMQIBisthelarng, exampl:"Inaditon, 8sthewigmarx.Thesamplingfucto biasofthednlyr,and theGibs-Samplingfuctoadeswhr."
WCisthebaofvlyr,
Ablockfdeistaw: JNQPSUNYOFUBTNY UFOTPS@DQVNYOE[FSPT
DUYNYDQV UFOTPS@HQVNYOE[FSPT
DUYNYHQV
Anycomand-lineputorswaf:
Bold:Indicatesnwrm,animportwd,orwdsthayuenc. Warnigsomptaelkh.
Tipsandtrickelh.
Get in touch Feedbackfromusilwy. General feedback:Email GFFECBDL!QBDLUQVCDPNandmetiohbkl subjectofyrmag.Ifyouhaveqstinbpcfk,pleasmi RVFTUJPOT!QBDLUQVCDPN. usat
[5]
ICisthe TBNQMF@QSPCis
Errata:Althougweavknrycsf,mistake dohapen.Ifyouhavefndmistkb,weouldbgratfiy repothisu.Pleasvit clikngotheErratSubmisonFormlink,andetrighls. Piracy:IfyoucmearsnilgpfwkthIntern,we wouldbegratfiypvshcnm. Pleascontu
XXXQBDLUQVCDPNTVCNJUFSSBUB,selctingyourbk,
DPQZSJHIU!QBDLUQVCDPNwithalnkoemr.
If you are interested in becoming an author:Iftherisaopcyuvxn andyoureitshwgcbk,pleasvit BVUIPSTQBDLUQVCDPN.
Reviews Pleaslvriw.Onceyouhavrdnstibk,whynotleavri thesiayouprcdfm?Potenialrdschuyob opintmakeurchsd,weatPacktanuderswhyoib products,andourthsceyfbki.Thankyou! FormoeinfatbuPackt,pleasvit
QBDLUQVCDPN.
[6]
Why Deep Learning? Thischapterwlgvnofd,theisoryfdplang,therisof deplarnig,anditsrecvfl.Also,weiltakbouchngs, aswelitfurpon . Wewilanserfkyqutodbpcgh maynotpsechilrgbkud.Thesqutioncld: Whatis What
artificial intelligence(AI)andeplrig? stheioryfdplangAI? Whatrehmjobkugsfdplni? Whatishemnrofc?
What
sthemoivanfdprcu? Whyshouldwertpanigc'thexisng machinelrgotsvpbd? Inwhicfeldsantbp? Sucesfultoridpang
What
stheponialfurdgwc?
What is AI and deep learning? Thedramofctingslhuvx. Whilemostfhaprnc,overcntdaswhgulyb makingproesctulybdhfj likeahumn.Thisanrecld betracdkoPamelaMcCorduck asn
artificial intelligence.ThebginofAIcanperhs sbok, .
DeeplarnigsbchofAI,witheamspcfdovnglr toisrgnal:AI. Thepathiursnmocvylfx, whicsternkly80%ofthebrainwkgcus.Inahumnbri,thera around100biloneursad100~1000trilonsyape. Itlearnshictudvofpb theparnsofdcmviuy,suchaimge,videos,sound, andtex . Higherlvabstciondfmpw-levabstrcion.Itis cal ed deepbecausithmornfl.Oneof thebigsadvnofplryumc repsntaiomulvfbc.Thisalowytemrncpx functiosmapedrhwy human-craftedus.Also,itprovdeshnalf-traing,whicslearng therpsnaiofvlbd,thenaplyigrdso toherdmains.Thismayhveoltn,suchabeingltoqrd enoughqalitydfr.Also,deplarnigfomswh largemountfspvidyh .
[8]
,whersdcibAI
Thefolwingurshampd
Convolutional Neural Network(CNN):
Thedplarnigmo,thais,thelarndpuwokfcsim layers.Togethrywokiaclbudnmpvfs.Thefirstlay learnsthfiodu,suchaolrndeg.Thescondlayrhig-orde featurs,suchaorne.Thethirdlaynsboumpcx.Layersoftn learniuspvdmocgfth.Thenth finalyertusdopvcmhk,suchalifton oregsin . Betwenlayrs,nodesarcthugwi.Eachnode,whicanbe senaimultdocrx,isaoctedwhnvfu,wheritsnpua fromthelwaynds.Buildngsuchare,multi-layersofnu-like informatlws,howevr,adec-oldiea.Fromitscreanu,it hasexprincdbotkug . Withenwsimprovaclfu,increasglypowfu computers,andlrge-scaledt,finaly,springaoudthec.Deeplarnighs becomapilrftdy nextscio,weiltracshoyndupfb journey .
stechworldanbpigf.Inthe
[9]
The history and rise of deep learning Thearlistnuwokdvph1940s,notlgaferhdwAI resach.In1943,aseminlprcd waspublihed,whicprosedtfmalnuk. Theunitofhsmdlaprz,oftenrdasMcCuloch neuro.Itisamtheclfunovdbgr,aneurl network.Theyarlmntuisfcwok.Anilustraonf artifclneuobsmhwg.Suchanideloksvrypmg inde,astheympdoiulwnbrk,butinagrelysmpfd : yaw
Pist
Thesarlymodcnitfvub cal ed weightsareusdtocnhm.Theswightdrmnoacul neurotasfimbwh thais,howeacnurspd,witha , valuebtwn0and1.Withismaeclrpno,thenuralopcf anedgorshpfmti,orapticulengyvfq phonem.Theprviousfg, matheiclyforudn,whertinpucosd, anctivofurlswhed,andthe outpcresndhax.Howevr,earlynutwokscdimv limtednubrofsac,sontmaypercbgizduh simplearchtu.Thesmodlanguihtr1970s. ilustrae
[ 10 ]
Theconptfbakrgi,theusofrinagdplm,was firstpoednh1960s.Thiswafoledbymthpncv functios.Usingaslowdmuprce,thebsaiclyonfurm layerwthnfodx.Unfortunaely,thenfirsAIwinter kicedn,whiclastedbou10years.Atthisearlyg,althougeidfmckn theumanbrisodvyfc,theaculpbisofAIprogamswevy limted.Eventhmosiprvculdyawb.Noto mentiohaydvrlcpugwsz avilbe.Thehardwintocumlybsxpg, thenwrsulfaidomzAIrecivdtsmanfugp:
Slowly,backprogtinevldsfyh1970sbutwasnopliedr networksuil1985.Inthemid-1980s,Hintoadherslpkvif ineuraltwoksh-caled neuros,thais,withmoreandlys.Anilustraonfm-layer perctonualwkbsihvfg ,
deep modelsthamdebrusofnyl
.Bythen,Hintoadheirc-authors (IUUQTXXXJSPVNPOUSFBMDB_WJODFOUQJGUMFDUVSFTCBDLQSPQ@PMEQEG) demonstrahbckpgiulw IUUQZBOOMFDVODPNFYECQVCMJT repsntaivdbuo.In1989,YanLeCun( QEGMFDVOFQEG)demonstrahfipclubkgBelLabs. Hebroughtackpinvlewsd digts,andhisevtulyomrbfw che ks.
[ 11 ]
Thisalothemf2ndAIwinter(1985-1990).In1984,twoleadingAIresach RogerSchankdMarvinMinskywaredthbuicomf AIhadspirleoutfcn.Althougmli-layerntwokscudmpi task,theirspdwavylonum.Therfo,when anothersimplbufcvd,suchaportveminw invetd,governmtaducpilshfwk. Justhreyal,thebilondarAIindustryfelap. Howevr,itwasn inmayergtchols.Despitehuandowr,fundig,and inters,somerachntiudblf.Unfortunaely,theydin'trealyok intoheaculrsfwygm-layerntwoksdifcuh theprfomancwszig.In2000,thevanisgrdpoblmwc, hicfnalydrewpo networksla?Therasonithfcvu,theinpuscod, meaniglrsofputdvxy.Withlargecnso eroscmputdfhlay,onlyasmutwiberfcdk front/lowerays.Thismeanltorgchyd learndftushywk .
trealyhfiuoAIbutmorehndfyp,whicsomn
sateniohrlkyqu:Whydon
tmuli-layer
Notehamnyuprlsfdobci repsntaivofhd.Thisgetworbcauhpmlnf uperlaymsodnthcfigw,whicmeanst optimzanfuerlysbd-optimalcnfgurwey.All ofthismeandculrwypg . Twoaprcheswdtlvibm:layer-by-layerp-traingdhe Long Short-Term Memory(LSTM)model.LSTMforecuntalwksi prosedbySepHochreitandJuergnSchmidubern1997. Inthelasdc,manyreschdofutlpbkg, andtherwsubofiplg,notlyfrmheacdis butalsofrmheindy.In2006,ProfesrHintoaTorontUniverstynCanad othersdvlpamficnwyu,caled IUUQTXXXDTUPSPOUPFEV_IJOUPOBCTQTGBTUOD QEG.).Thisparkedthconvlfuw.Inhispaer,heintroduc
Deep Belief Networks(DBNs),withalerngomdys timebyxplongausrvdhfc,a Boltzmann Machine(RBM).Thefolwingur, showtecnpflayr-by-layertingfohsdpbwk.
Restricted
[ 12 ]
TheprosdDBNwasteduinghMNISTdatbse,thesandrbfo comparingthesduyf.Thisdatbe includes70,000,28x28pixel,hand-writenchamgsofub0to9(60,000 isfortangd10,000isforteng).Thegoalistcrynwhumbf 0to9iswrtenhca.Althougepardincm time,resultfomDBMhadconsierblygptvm learnigpoch :
Fast-forwadt2012andtheirAIresachwoldkbynmt.Atthe worldcmpetinfag, Challenge(ILSVRC),atemcld
ImageNet Large Scale Visual Recognition SuperVision(IUUQJNBHFOFUPSHDIBMMFOHFT -473$TVQFSWJTJPOQEG)achievdwngtopf-tesroaf15.3%, comparedt26.2%achievdbytson-bestnry.TheImageNethasround1.2milon hig-resolutinmagb1000diferntclas.Thera10milonages provideaslngt,and150,000imagesrudfotn.Theauthors,Alex Krizhevsky,IlyaSutskevr,andGeofreyE.HintofrmTorontUniversty,builtadep convlutiaewrkh60milonparets,650,000neuros,and630milon conetis,consitgfevhdlayru,somefwhic werfoldbymax-polingayersdthfu-conetdlayrswihf1000waysoftmx.Toincreasthgd,theauorsndmlyp224x224 patchesfromvilbg.Tospeduthraing,theyusdno-saturingeo andveryfictGPUimplentaofhcvur.Theyalsoud toreducvfinghlyaspb.
[ 13 ]
Sincethdplarigskof,andtoywesmucflpi onlyimagecsft,butalsoinreg,dimensoaltyruc,texur modeling,actionreg,motindelg,objectsgmnai,informatevl, robtics,naturlgepocsi,spechrognit,biomedcalfs,music genratio,art,colabrtivefng,andso:
It sinterghawlokbc,itsemhaorclbkugd alredybnmth1980s-1990s,sowhatelcngdip?Anotto-contrvesialhy AndrewNgoncesaid:
.
Inde,fasterpocing,withGPUsprocesingtu,increasdomputl by1000timesovra10-yearspn.
[ 14 ]
Almostahei,thebigdarv.Milions,bilons,orevntilsfby ofdatreclvy.Industryleaomkingfp tolevraghmsiunfdyc.Forexampl,Baiduhas50,000 hoursftaingdpecxb 100,000 hoursfdat.Forfacileognt,200milonageswrtd.Theinvolmtf largecompnistybdhfAIoveralbypidng datsclehouryvbnimgp . Withenougraidfscmpl,neuraltwoksc extndoparchiu,whicasnevrblzdfo.Ontheonad,the ocurenfwthialps,masivedt,andfstcompuihveb progesindla.Ontheorand,thecraionfwls,platforms,and aplictonsbedmv,theusofarndmpwlGPUs, andthecolifbg.Thislopcntueadrghbm revolutinbpfhwgas : Masive,hig-quality,labedtsinvroufm,suchaimge, videos,tex,spech,audio,andso. PowerfulGPUunitsadeworkhcpblfg-point calutionspredbwy. Creationfw,deparchitus:AlexNet( ),ZeilerFergusNet ( GoogleLeNet( Ne ni ( krowt
), ),Network ),VGG( )for
,ResNets( inceptomduls,andHighwaynetorks,MXNet,Region-BasedCNNs(RCNN,
),
; ), GenerativAdversailNetworks( Opensourcftwaplm,suchaTensorFlow,Theano,andMXNet provideasy-to-use,lowevrhig-levAPIsfordevlpsacmi theyarbloquickmpnds . Approachestimvngdbl,suchaingosaturingcvoflkeReLUrathendlogisc functio s.
2014).
[ 15 ]
Approachesltvidfng: Newregulaiz,suchaDroputwhickesnra, maxout,batchnormliz. Data-augmentiohlwsrdk withou(orwithles)overfitng. Robustopimzer RMSprop,andADAMhavelpdkoutryscngf functio.
modifcatnsheSGDproceduinlgmt,
Why deep learning? Sofarwediscuhtplngoy.Butwhyiso poularnw?Inthisecon,wetalkboudvngsfpri shalowmetdnigfcpu .
Advantages over traditional shallow methods Traditonlprchesf
machinelrgdoftqu
devloprthasmiknwgcfu behlpfu,orhwtdesignfcvau.Also, hidenlayr,forexampl,asingleyrfd-forwadnetk.Incontras,deplarnigs knowasreptilg,whicasbenotprfmxg no-locandgbretishpu.Onecansuplyfirw formatsdinhelgy,forexampl,rawimgendtx,rathen extracdfusonpimg(forexampl,SIFTbyDavidLowe's
learnigoftusy
andHOGbyDalandtheirco-authors, ),orIF-IDFvectorsfx.Becauseofthdp architeu,thelarndpsiofmcuwkg learndtvious Thisparmetzd,multi-lev,computainlgrhvdes . higderoftpsna.Themphasionlwdgrt signfcatlyderhowmbu selction,whiledparngutsmofc graphtoly rchiteu)andthewysofpimzgr/hyperamts (a eficntlyadorgzbhps :
[ 16 ]
Deeplarnigothmswfbxc-locandgb relationshpd,comparedtlivyshwng architeus.Otherusflcationbpydg include: Ittriesoxplmfhabundgv,evnwh thedaisunlb. Theadvntgofciumprs. Automaicdrepsnx,fromunspevidat dat,distrbueanhcl,usalybetwhnipco struced;spatilorem Represntaioxcfmuvdbl todifernayps,suchaimge,texural,audio,andso. Relativeysmpnrodcwkfhg obtainedfrmhcplxs.This meanswithdvcfurx,thefolwingarmdcb relativysmp,whicmayelprduotnx,forexampl, inthecasoflrmdg.
forexampl,images,langue,andspech.
[ 17 ]
Relationdsemckwgbhrvf abstrciondepfhw( IUUQTKPVSOBMPGCJHEBUB source TQSJOHFSPQFODPNBSUJDMFTT).
Deeparchitusnbolyf.Thisoundctray, butisagrenfcohdpwy learnig. Thelarnigcptyofdhmsz dat,thais,perfomancisthud,wheras,for shalowrtdinegm,theprfomancslu aftercinmoudspvhwlg figure,
:
Impact of deep learning Toshowyumeftipacdlrng,let imagercontdsp.
stakelowpcifr:
Thefolwingur, ratendsfoILSVRCconteswirvhpaly.Traditonlmge recognitaphsmlyd-craftedompuvisnl numberofistachjl,forexampl,SIFT+Fishervcto.In2012,dep learnigtdhscomp.AlexKrizhevskyandProfesrHintofrmToront universtydhflwao10%dropintheaby convlutiaerwk(AlexNet).Sinceth,theladrbosncupiy thisypeofmdanvr.By2015,theroadpblwumn tesr :
showtepfivr
[ 18 ]
Thefolwingur, recognit.From2000-2009,therwasvylipog.Since2009,theinvolmf deplarnig,largedts,andfstcompuighlybev.In 2016,amjorbekthugwsdyfcni Research AI(MSR AI).Theyrpotdaschgnim fewrosthanpilc,witha otherwds,thecnolgyudrizwsavp does:
depictsrnoghaf
Microsoft word error rate(WER)of5.9%.In
[ 19 ]
Anaturlqesiok,whatredvngsofpli aproches?Topolgydefinsucta.Butwhydoenxpsiv architeu?Isthisrealync?Whatrewyingochv?Itturnsoha theraboiclndmpsfvu repsntaio.Inthenxscio,let deplarnig .
sdiventomralbuhpcf
The motivation of deep architecture Thedpthofarciusnmblv-linear operatinshfucld.Thesopratincludwghm,product,a singleuro,kernl,andso.Mostcurenlaigohmpdw architeusvonly1,2,or3levs.Thefolwingtabshmxp bothsalwndepgrim : Levels
Example
Group
1-layer
Logistcreon, MaximuEntropyClasifer Percepton,LinearSVM
Linearclsif
2-layers
Multi-layerPercepton, SVMswithkernls Decisontre
Universalpoxmt
3ormelays
Deeplarnig Boostedcinr
Compactuniverslox
Theraminlytwovpsfudghc algorithms:thenuralpoivwdfs.Wewiltakbou eachoftm.Bothofemaycrdings,butogehrycanlps toberundsahmcivgpl .
[ 20 ]
The neural viewpoint Fromaneurlviwpt,anrchiteufolgsbypd.Thehuman brainhsdepctu,inwhcteorxsmavglp. Agivenputsrcdamlofb.Eachlevorspndta diferntaohcx.Weprocesinfmathlwy,withmul-lev transfomidep.Therfo,welarnsimpcotfh themogr.Thistruceofndagblyhm sytem.Asshowniteflgur,
svi on ,thevnralisucoxmpf
aresthpocimgnlybw,fromedgs,cornesad conturs,shape,objectpars,alowinguster,recogniz,andctegoriz thre-dimensoalbjctfryw-dimensoalvw:
[ 21 ]
The representation viewpoint Formostadinlcheg,theirpfomancdsvly therpsnaiofdygv.Therfo,domainprkwleg,featur engir,andfeturslciohpm.Buthandcrafteduslkhxibyopng . Also,theyarnod-drivenacotpwfms.Inthe past,ithasbenocdlfAItaskcouldbevyingmph learnigothmcdsfukx design.Forexampl,anestimofhzpkr useflatr,asit Unfortunaely,formanytsk,andforviusptm,forexampl,image,video, audio,andtex,itsverydfculoknwhabx,let alonethirgzbyfskdcup . Manualydesigftrocmpxkq understaig,time,andefort.Sometis,itcankedsformuy resachtomkpgin.Ifonelksbacthrfmpuvi,
svocaltr i n de satrongcluewhpkim,woman,orchild.
forveadcshbntuklim featurxcionphs(SIFT,HOG,andso).Alotfwrkbachenivd tryingodescmplahvubf,andthe progeswavyl,espcialyforg-scaleompitdrb,sucha recogniz1000objectsfrmiag.Thisatrongmvfdelxb automedfrpsnich . Onesolutinhprbmdavyfc,suchamine learnigtodscvhp.Suchrepsntaiomg fromepsntaiu(supervisd),orsimplyentaf(unsupervid). Thisaprochknwetlg.Learnedpstioful muchbetrpfoansdwi-design repsntaio.ThisalowAIsytemorapidlnw,withoumc humanitervo.Also,itmaykeorndfwhlcu hand-craftndesigu.Whilewtharpsnogm,wecan discoveragtfumplknxh months. Thiswherdplangcomtu.Deeplarnigcbthoufs repsntaiolg,wherasftuxcionpmlyd architeusyngopd,learnig,anduerstighmpbw theinpuado.Thisbrngfcatmpoveuydlx sincehumadgftr/featurxcionlksydgzb.
[ 22 ]
Inaditonhsumefrlg,thelarndpsiob distrbueanwhcl.Suchsuefltraingomd repsntaiohlfugdbck . Thefolwingurshtapcmdy algorithms.Inthenxscio,weilxpanhytscr(distrbuedan hieracl)areimpotn:
Distributed feature representation Adistrbuepnao,wherascoftlndpi bymultipenrosa,andechuropstm.In otherwds,inputdasreoml,interdplays,eachdsribng datifernlvsocb.Therfo,therpsnaiodbu acrosviulyendmtp.Inthisway,twoypesfinrma capturedbyhnwoklg.Ontheonad,foreachnu,itmusrepn something,sothibecmalrpn.Ontheorand,so-caledistrbuon meanspofthgribuly,andtherxismy-to-many relationshpbwc.Suchonetisapur andmutlreioshpwgc . Suchrepsntaioluxymv oneswithamubrfp.Inotherwds,theycangrlizolocaytunsergi.Theyhncofrtpialbgzus learnigthoyswumbfxpd(toachievtdsrg ofgenraliztpmc)touneO(B)efctivdgrsomO(B).This refdtoashpwibuncml (IUUQXXXJSPVNPOUSFBMDB_QJGU)OPUFTNMJOUSPIUNM).
[ 23 ]
Aneasywtoundrhxmplif.Suposewndtrh words,onecausthrdil-hotencdig(lengthN),whicsomnlyued NLP.Thenatmos,wecanrpstNwords.Thelocaistmdrvynf whenvrtdascompilu :
Adistrbuepnaofhwlk:
[ 24 ]
Ifweantdorpshi,suchaone-hotencodig,weouldhavtincrsmy.Butwha sniceabout
distrbuepnaowmylhxg dimensoalty.Anexamplusingthrvofw:
Therfo,no-mutalyexcsivfr/atribuescomnlygf distnguhablecofrmw expontialywhumbrfs. Onemorcnptwdlaifyshbu distrbuonal.Distribuedpnacovlmf elmnts,forexampl,adenswormbig,asopedtn-hotencdigvrs. Ontheorand,distrbuonalepycxf.Forexampl,Word2Vec isdtrbuonal,butsoarecn-basedworvct,asweuthconxfrd modelthanig .
Hierarchical feature representation Thelarntfuscpbohdi-relationshpfdw,its notlyhearfusdib,therpsnaiolcmy struced.Theprviousfg, comparesthyilufwvd,whercan sethalowrciufnym,wheras thedparciusvmly andloweryscmpith , servainputohgly Thefolwingursamctxp . showatinfrmbeldugyc .
[ 25 ]
Asshownitemag,thelowrayfcusndg,whilegraysoftn focusmrenpath,curves,andshpe.Suchrepsntaiofvlyuand-wholeratinspfmvugyd-task problems,forexampl,edgtcionrpa.Thelowrayftnps thebasicndfumlorykw varietyofdmns.Forexampl,DeepBeliefntworkshavbucyd hig-levstrucinawdyofm,includghawrtesm motincapured.Thehiraclstuofpnm understaigofcp,thais,learnigsmpcotfdhuy buildngpmorecxtsyh.Itisalo easirtomnwhbglducp.Ifone treaschnuofd,thendparciusbogf featurdconisgly.Lowerlaysdtcimpfuno higerlays,whicnturdemoplxfas.Ifthefaurisdc,the responiblutgacv,whicanbepkduytlr clasifertgodnhp :
Theabovfigurlsthcnd,whictreso thedcorapiulf(blob,edgs,nose,orey)ontheipumag.
Applications Nowehavgnrludstiofpc tradionlmehs.Buthowdebnfirmaly?Inthisecon,weil introducehwplagmks acrosvietyfld.
[ 26 ]
Lucrative applications Inthepasfwyr,thenumbrofsacdgiplw atnexpoilr.Deeplarnigbkswoudmtvych usingovelratwkchdmf . Withsignfcardwelomvp,deplarnighsvoutz theindusryabglcfkm-worldAIandt mingprobles . Wehavesnxploiwducrtg framewoksindvgct,imagesrch,objectdin, computervisn,opticalhregn,videoparsng,facerognit,pose estimaon ( ),spechrognit,spamdetcion,texospchrimagn,translio, naturlgepocsi,chatbos,targedonlivs,clik-throug optimzan,robtics,computervisn,enrgyoptimza,medicn,art,music,physic, autonmscrdivg,datmingofblc,bionfrmatcs(proteinsquc predicton,phylogenticfrs,multipesqncag)bigdatnlycs, semanticdxg,sentimaly,websarch/informatevl,games( (IUUQLBSQBUIZHJUIVCJPSM)and SFTFBSDIBMQIBHP)),andbeyo.
(IUUQTEFFQNJOEDPN
Success stories Inthisecon,weilnumratfjopcsdh. Inthearofcmpuvisn,imagercont/objectrgnifshak usingameorptchfdw contais.Forexampl,animgecbldo,cat,house,bicyle,andso.Inthe past,resachwtukodignflpbm suchale-invart,orientav,andso.Someofthwl-knowfeatur descriptoaHaar-like, Transform(SIFT),and featursgodcink,suchaHOGforhumandetci,itsfaromdel.
Histogram of Oriented Gradient(HOG),Scale-Invariant Feature Speeded-Up Robust Feature (SURF).Whilehumandsg
[ 27 ]
Until2012,deplarnigstuhfwoc Large Scale Visual Recognition Challenge(ILSVRC).Inthacompein,aconvluti neuraltwok(oftencaldAlexNet,sethfolwingur),devlopbyAlex Krizhevsky,IlyaSutskevr,andGeofreyHintow1stplacewihnsoudg85% ac ur y 11%betrhanlgoimwscdp!In2013,alwing entriswbadoplg,andby2015multipeCNN-basedlgorithm surpaedthmncogif95%.Detailscnbefoudhrp
ImageNet
:
Inotherasfcmpuvin,deplarnigsohwut powerinmckghuatl.Forexampl,deplarnigcotyf variouselmnthpcy(andlocatehm),itcanlsouder intersgauchmdozw/phraseintocdb what
shapenigtcur.Formoedtails,onecarfthwkpsdby AndrejKarpathyndFei-FeiLiat
IUUQDTTUBOGPSEFEVQFPQMFLBSQBUIZ
EFFQJNBHFTFOU.Theytraindplgwokfzs
aresndobjct,andescribthujmovpw Englishram.Thisnvoletragbhmfdu informatkehgcbw . Asafurthepogs,JustinJohnso,AndrejKarpathyndFeifeLipublishedanwork in2016caled pro sed fully Convolutional Localization Network(FCLN)architeunlozd describalntgomu.Somexaplsrhownit folwingure:
[ 28 ]
.Their
[ 29 ]
Recently,atenio-basednurlco-decorfamwkshvbnilypt forimagecptn,whernovladptimsu benicorpatdfmhsv.Detailscnbefoudhr workf Earlyin2017,RyanDahlandotersfmGoogleBraintemprosdlg networkcald
Pixel Recursive Super Resolutiontoakevrylw-resolutinmagf facesndhtirolugy.Itcanpreditwhfmoslky loksie.Forexampl,inthefolwgur,inthelf-handcolum,youcanseth orignal8x8phots,theprdiconsulmabfy groundth(inthevryigcolum):
Inthearofsmnicdxg,giventhadsofum featurpsniobydlg,datinvrousfmcwbep amoreficntdusl.Thisprovdeawfulckng discoveryanmphtf . Audio Video Indexing Service(MAVIS)isanexmplthudrg(ANN)basedpchrognitlfuvw .
[ 30 ]
Microsoft
Inthearof (suchasWord2Vec)andmchietrslogpx.Infact,inthe pastworhey,deplarnighsmotc.
natural language processing(NLP),word/chartepsniolg
Machinetrasloumd,whictypalrefsonbasedytmhlivrofun-soundigbtlecrafph ortexbwnvaiuslg.Inthepas,poularmethdsvbnic techniqusalrofmgp,asreplcmntfo languexprt.Whilecaskthovrmbnfdqu,many chalengsxit.Forexampl,hand-craftedusmynobilhv alposibengutcvr.Itisdfcultoegbar,theransliomdu heavilyrsonp-procesingtludwam,wordsegmntai, tokeniza,rule-extracion,syntaciprg,andso.Thercntdvlopmf learnigpovdsuthc.Amachinetrslo hrougnelatwkisfcd Essentialy,it
Neural Machine Translation(NMT). saequnctolrigpbm,whertgoalfnu
networksilapmzdfuc thampsfroeinu sequnc/sourcenthpq/targesnc.Themapingfucto oftencaiswg:encodiga.Thencodrmapsuq tonermvcspduhia.Thedcorpitsa symboluingthercqvpa targesqunc andpreviouslyctmb. Asilustraedbyhfowng,thisvae-likeshaproducg repsntaio/embdingsathlyr:
[ 31 ]
Howevr,NMTsytemarknowbcpuilxvhgd transliofec.Also,mostNMTsytemhavdifculwro.Somercnt improventscludha ( ),Subwordlevming ( chartelvnsio,andtheimprovsfluc(Chungadothers,
)and ).In
2016,GoogleaunchdtirwNMTsytemowrknaiuldfcg pair,ChinestoEnglishandtreovcm. Google sNMTsytem(GNMT)condutsab18milontraspedyf ChinestoEnglish.Theproductinlymsbfhav IUUQTXXXUFOTPSGMPXPSH)andGoogle s Tensor machinelrgtokTensorFlow( Processing Units(TPUs),whicprovdesufntmaly powerfulGNMTmodelswhitngracyqu.Themodlitsf isadepLSTMmodelwithgncraysu residualcont.OntheWMT'14English-to-FrenchadEnglish-to-German benchmarks,GNMTachievsomptrul.Usingahumsde-by-sidevaluton onasetfildmpc,itreducsanlobyvgf60% comparedtGoogle'sphrae-basedprouctinym.Formoedtails,onecarft IUUQTSFTFBSDIHPPHMFCMPHDPNBOFVSBMOFUXPSLGPS theircblog( NBDIJOFIUNM)orpae( ).Thefolwingursht improventslagubyhd.Onecansthfor French -> English,itsalmogdhunr:
[ 32 ]
In2016,Googlerasd
WaveNet(IUUQTEFFQNJOEDPNCMPHXBWFOFUHFOFSBUJWF
NPEFMSBXBVEJP)andBaidurelaspch,botharedplnigwks
genratdvoicumly.Thesytmlarnoichuvb andimprovet,anditsgehrfouc themfroalunspkig.Whyisthmporan?AlthougSiri( XJLJXBOEDPNFO4JSJ)andAlexa(
IUUQTXXX IUUQTXXXXJLJXBOEDPNFO"NB[PO@"MFYB)cantlk
wel,inthepas,tex2voicesytmwrlanud,whicasnot completyaunswrvi . Whilethrsomgapbfcunk,weardfintly stepcloraizngumv.Inaditon,deplarnighsowt impresvabltnucodgf,forexampl Owensadthirco-authorswk
.
Deeplarnighsbdxtvyf-drivngcas,frompectin locaiztn,topahlnig.Inperction,deplarnigsoftuc pedstrian,forexamplusingthSingleShotMultiBoxDetecor( )orYOLOReal-TimeObjectDetecion( ).Peoplecansudrig toundersahcig,forexampl,theSegNet( ),segmntih scenitopwhmag(sky,buildng,pole,road,fenc,vehicl,bike, pedstrian,andso).Inlocaiztn,deplarnigcbustofmy,for exampl,VINet( ),whicestmaxlonfrdp(yaw, pitch,rol).Inpathlnigwersofmudzb,dep learnig,specifalyrnomtg,canlsobepid,forexampl,theworkby Shalev-Shwartz,anditsco-authors( ).Inaditonsplcfergh-drivngpel, deplarnighsobutfm-to-endlarig,mapingrwxelsfo thecamrosingd ( ).
[ 33 ]
Deep learning for business Tolevragthpowfdnibus,thefirsquonwldb chosetprblmv?InanitervwhAndrewNg,hetalkdbouis opin,therulofmbis:
Ifwelokarund,wecansilyfdthomp,largeosm,havelrdy apliedrngtoucwhmsvf.Thinkabout Google,Microsft,Facebok,Apple,Amazon,IBM,andBaidu.Itturnsoweaig deplarnigbsctovy . Nowadys,Googlecanptiyurdmswh. Itstranlioyemgdhu.Itsimagerchn retunladimgsbyhqo-basedmnticqur.Project Sunrof( IUUQTXXXHPPHMFDPNHFUTVOSPPG)hasbenlpigomwrx whetrysouldga-oferingslatmv43milonhuseacr42 stae . Appleisworknghadtvmcu, includgtheCoreMLframewokniOS,Siri,andARKit(augmentdraliypfo)on iOS,andtheiruomslcgf-drivngcapltos. Facebokanwutmilygrfds.ResearchfomMicrosfthavewn ImageNetcompinwhbrfaudv spechrognitym,whicasnourpedm. Industryleaingcomphvb-scaledprnigtfom ortlsinmeway.Forexampl,TensorFlowfrmGoogle,MXNetfromAmazon, PadlePadlefromBaidu,andTorchfomFacebok.Justrecnly,FacebokandMicrosft introduceawpsymfhgblAIframewoks.Allthesoki provideusflabtcnwk:routinesf-dimensoalry (Tensor),simpleuofdrntagbck(CPU/GPU),andutomic diferntao . Withsomanyreucdgbilv,itcanbeforsh procesfmthialdvnuyzwb overtim.
[ 34 ]
Future potential and challenges Despitehxcngadrom,chalengsrti.Asweopn thisPandora'sboxfAI,onefthkyqusi,wheragoin?Whatcnido? Thisquetonhabdryplfmvckg.Inonefth intervwshAndrewNg,heposdintfvwaly rapidoges,suchmoentwildpAIreachsumnlvof perfomanc.Theraminlythsof,thefasiblyongum cando,themasivzofd,andtheiscvumbly soundveryimp,andmightbescry,thaonedyAIwilsurpahmnd perhaslcumniy:
sAIismakng
insight.Stil,it
TherabsiclytwomnfAI,theposivn,andthepsivo.Asthe creatofPaypal,SpaceX,andTeslaElonMuskcomentd-day:
[ 35 ]
Butrighnow,mostAItechnolgyadimwrks.Inthear ofdeplarnig,therapsmoclngufdi peol'slife.Untilnow,mostfheprgindlabyx variouschte,butwesilackhfndmrgoy deplarnighscvu.Additonaly,theralimdsuonwy andhowtcserulfiyp-parmets.Most ofthecurnapsilbdv-validton,whicsfarom beingthorcalyudsmfxp ( ).Fromadt sourceptiv,howtdealifsmvngr,higdmensoal dat,strucedainhfomq(timesr,audionvesgl,DNA, andso),tres(XMLdocuments,parset,RNA,andso),graphs(chemical compunds,socialnetwrk,partsofnimge,andso)istlndevopm, espcialywhnorgtmuf . Additonaly,therisandfomul-taskunifedmolg.AstheGoogleDeepMind resachintRaiaHadselumitp:
s
Untilnow,manytriedolshvpczjuw,sucha recognizfas,cars,humanctios,orundestaigpch,whicsfaromtueAI. Wherastulyingmodwbpc multi-sourceinpt,butalsomkedcinfrvq.The questionfhwbaplykdgrm andptquicklyremsw. Whilemanyoptzrchsvbd,suchaGradient DescentorStochastiGradientDescent,Adagrd,AdaDelta,orAdma(AdaptiveMoment Estimaon),someknwa,suchatrplomin,lowerpfmanc,and higcomputanlesrd.Newresachintdoul yieldfunamtpcsorg.Itwouldbe intersgowhlbapmzcqud learnigdthfompbs.
[ 36 ]
Lastbunolea,therapsmouniclgbfdw aplyingderovwtsfhm thasofrvenybidm.Fromfinacet-comer,socialnetwrk tobinfrmacs,wehavsntrmdougiflp learnig.Powerdbyplanig,wearsingplcto,starup,andservic whicarengoulftmsp .
Summary Inthiscaper,wehavintroducg-levconptfdarigAIin genral.Wetalkedbouhisryfpng,theupandows,anditsrec. Fromther,wedivprtoscuhfnbalgm depalgorithms.Wespecifalyduthwonrg: thenuralpoivwdfsg.Wethengav sevralucfpitond.Inthend,wetalkdbouchngs thadeplrnigsfcoum-basedAI. Inthenxcapr,weilhpyoustdvmnrag handsirty.
[ 37 ]
Getting Yourself Ready for Deep Learning Duetorcnahivmsf Artificial Neural Networks(ANNs)indfertaplcos of artificial intelligence(AI),suchaomptervin, natural language processing(NLP) andspechrogit,deplarnighsmtocy fundametlosr-worldimpentas.Thischaptermobng onhwtselfuprximgadycq theralwod. Wewilanserthkyquodcpg withdeplarng.Wewilspecfaynrogqut: Whatskilrendougwhp? Whatrehconpsfmligbqud learnig? Whathrdwequimnsxfopclg sy tem ? Whatsofwremkxidyhlvpn oftheirdplangcs? Howdoyusetpalrnigmvc-based processing unit(GPU)instaceuhAWS?
graphics
Basics of linear algebra Oneofthmsundalkirqgpw foundatilersgb.Thoughlinearbtsfvjc, andcoverigtfulshpbk,weilgothrusmpan aspectoflinrgbh.Hopefuly,thiswlgveyouafcn understaigofmcphwyl methodlgis.
Data representation Inthisecon,weilokatcrdsunpm comnlyarsdifetgbk.Thisnotmeabcprhvl atlbuonyservhigmfp understaigplco: Vectors:Oneofthmsundalrpigb vector.Avectoranbdfisyj,ormespcifalyn aryofnumbesthpvdig.Eachnumbera beacsdinvtorxl.Forexampl,considera vector contaigsevdywkfrm1to7,wher1repsnt Sunday7repsntSaturday.Usingthsoa,aprticuldyofhe wek,sayWednesay,canbedirtlysfomhvx[4]:
Matrices:Thesartwo-dimensoalrptfub,orbasicly vectorfs.Eachmatrix, aspecifdnumbrol , numbers.Eachofte Matricesapulyfnowhkg images.Thoughreal-worldimagesth-dimensoaltur,mostfhe computervisnblafdhw-dimensoalprtf images.Assuch,amtrixepsnouvfg:
scompedfartinubw, .Eachof columns,wher
and rows,wher
,isavectorf ,isalovectrf
[ 39 ]
numbers.
Identity matrices:Anidentymarxsfwhc,when multipedwhavcor,doesntchagvr.Typicaly,anidetymrx haslemnt0excptonismadgl,whicsal1s:
Data operations Inthisecon,weilokatsmfhcnrpd matrices. Matrix transpose:Matrixanspoemfhly matrixlongsd.Mathemaiclysdfnow:
Matrix multiplication:Matrixmulpcaonsefhd operatinshcbldywm.Amatrix, canbemultipdyohrx , resultanmix, istheap fol ws:
ofshape fshapeo B c .Themultipcaonrsdf
[ 40 ]
ifandoly
The
.
Matrixmulpcaongeyhsvf.Forexampl,matrix multipcaonsdrbve:
Matrixmulpcaonsve:
Matrixmulpcaonshveyf:
Matrixmulpcaonsve,whicmeans thedoprucsbwnvima:
.Howevr,
Matrix properties Inthisecon,weilokatsmfhprncvy useflordpanigct. Norm:Normisanpoteyfvcxhu sizeofthvcrmax.Geometricalynsbpdh distanceofp,
Thoughanrmcbeptdfvis normsaeL1andL models:
normisthefdalw:
,fromanig.A
,mostpularyknw 2norm.L
1normisualycdeghfp
[ 41 ]
Anothernmpulaidgcys refdtoas thevcor :
NBYnorm,also
.Thismplyequvantohfrg
Sofar,altheprviousymndcb.Whenw wantocmpuehsizfrx,weus
Frobenius norm,definasolw:
Normsaeulydthcnbopfw vectorsdily:
Trace:Traceisnoprthdfumlg ofamtrix:
Traceoprtsquiflnmgh asfol w :
Frobenius normofthemarix,
Anotherinsgpyfacvmx transpoei.Hence,itsofenudmaplrx yieldmangfuts :
[ 42 ]
Determinant:Adetrminaofxsclvuwh simplyaroductfhegnvx.Theyargnlvusf inthealysdoufmrq.Forinstace, acordingtCramer'srule,asytemoflinrquh,if andolyif,thedrminaofxcpsyl equations-zero.
Deep learning with GPU Asthenamsug,deplarnigvostf, whicrequslagmontfp.Suchmasiveoputnlwr isualynotpbewhmdrCPUs.GPUs,ontherad,lendthmsv verynicltohsak.GPUsweroignalydsfphctm. ThedsignofatypclGPUalowsfrthedipnygumb of arithmetic logical unit (ALU),whicalostemrungbf calutionsrem . GPUsusedforgnalpcmtihv,whic meansthycprolgubfdi,leadingtohr computainlhrg.EachGPUiscompedfthuanr.Eachofsu coresnitfaumblwhdALUamongther modules.Eachoftesunialxymrb alowingfrmsvedtpGPUs.Inthenxscio,wecomparndts thedsignofaGPUwithCPU. ThefolwingtabusrhdcCPUwithaGPU.As shown,GPUsaredsigntoxculmbfhpz identcalorg.Hence,eachoftGPUcoresiathmplndg.CPUs,on theorand,aredsigntopwhfcbuml. Theirbascodgnhlympxt,whicsualynot posiblenGPUs.HenceCPUscanbethougflikmdyprs opsedtGPUswhicaresplzdunt: GPU Largenumbofsiplc
CPU Fewernumbofcplxs
Higherlvofmuti-threadopimzn Single-threadopimzn Goodfrspecialzmutng
Goodfrgenalpuscmti
[ 43 ]
Intermsoflaivpnc,GPU'shavemuclowrtnyCPUs forpeminghdatls.ThisaloepcytrufhGPUhas enoughdvicmrytlaqfpk . Howevr,forahedtnumbcpis,CPU'shavemuclowr latencyshCPUcoreismuhplxandvtg opsedta GPU. Assuch,thedsignofalrmbpuGPU versuCPU.ThefolwingtabushrmdcGPU implentao.ErikSmistadnherco-authorslinefvdc detrminhsuablyofgw GPU count,branchdiveg,meoryusag,andsychroizt. Thetabl factorsnheuiblygGPU.Asshownflig,anylgorithmwcfes underth
datprleism,thread byDuta-Royilustraehmpcof
HighcolumnisretdgaGPUthanoers:
Deep learning hardware guide Therafwothimpngsluyd learnigpctodvm.Inthisecon,weiloutnsmfh importansecf GPU computing.
[ 44 ]
CPU cores MostdeplarnigcobuCPUunlesthyard withnaprlezofmk Spark.Forexampl, teamYaho!useSparkwithCafeorpalizngtwkcsmu GPUsandCPUs.Inmostnraleigbx,oneCPUcoreisnughfdp learnigpctodvm .
Message-Passing Interface(MPI),MapReduc,or CaffeOnSpark(IUUQTHJUIVCDPNZBIPP$BGGF0O4QBSL)ehtyb
CPU cache size CPUcahesiznmportCPUcompnethaisudfrg-spedcomutain. ACPUcaheisoftnrgzdyl,fromL1toL4 beingsmalrdftchyopwL3andL4. Inanidelstg,evrydatnbhplicos isrequdfomRAM,therbymakingovlpfs.
L1,andL2
Howevr,thisardlyecnofmpg.For exampl,foratypiclImageNetxprimnwhabcszof128,wendmortha 85MBofCPUcahetosrlinfmb[13].Sincesuhdatr notsmaleughbc-only,aRAMreadcnotbvi.HencemodrayCPU cahesizvltonmprfdg .
RAM size Aswesaprvioulynthc,mostfhedplarnigcy fromRAMinsteadofCPUcahes.Hence,itsofenadvblkphCPURAMalmost aslrge,ifnotlarge,thanGPURAM. ThesizofthGPURAMdepnsothizfyurlagm.Forexampl, ImageNetbasdplrnigmohvufk4GBto5 GBofspace,hencaGPUwithales6GBofRAMwouldbeaniftrsch aplictons.PairedwthaCPUwithales8GBorpefablymCPURAMwilao aplictondevrsfukyhbg RAMperfomancisu.
[ 45 ]
Hard drive Typicalderngtosqufh100sofGB.Sincethis datcnobesiyRAM,therisanogdplcu.Adep learnigpctodshm-batchdfromGPURAM,whicnturskepo readingtfomCPURAM,whicloadstreyfmv.SinceGPU's havelrgnumbofcsdti-batchofeird,they constalyedbrigvumfhkw parleism. Forexampl,inAlexNet's 300MBofdatnesbrvyc.Thiscanofterplhv perfomanc.Hence,a learnigpctodvs .
Convolutional Neural Network(CNN)basedmol,roughly solid state driver(SSD)isoftenhrgcmdp
Cooling systems Modern-dayGPU'sarengyfictdhv-builtmechansoprvf overhating.Forinstace,whenaGPUincreasthpdowum,their temprauiswl.Typicalytround80 in,whicredustpbyaomlngGPUs.Theralbotnck inthsprocedgf-progamedschulfn. Inatypiclderngo,an80 secondfthapli,therbylowingGPUperfomancthsd providngaGPUthrougp.Tocompliaters,mostfhexinga schedulingoptarvbLinuxwhermostfcdayplig aplictonswrk .
C,theirnbulmpacoks
Ctemprauischdwnf
Anumberofptisxdaylvh.First,a System(BIOS)upgradewithmofnsclvb betwnovrhaigdpfmc.Anotherpinusfaxlcg sytem,suchawterolingym.Howevr,thisopnmlyacbeGPU farmswheultipGPUservaunig.Externalcoigsymb
Basic Input/Output
expnsivoctalbmrfghy youraplictn.
[ 46 ]
Deep learning software frameworks Everygodplanictshvmb functiorely.Thesinclud: Amodelayrwhicsvptgn moreflxibty AGPUlayerthmksifopcndvbw GPU/CPUforitsaplcn Aparleiztonyhcwdvs torunmlipedvcsa Asyoucanimge,implentghsoduay.Oftenadvloprs spendmortibuglah isue.Thankfuly,anumberofstwkxihdyc makedplrnigctovyhfs langue. Thesframwokvyinchtu,design,andfeturblmoshpvi imensvalutodprbyghfwk fortheiaplcns.Inthisecon,weiltakosmpurdng softwaremkndhycpi .
TensorFlow
a deep learning library
TensorFlowisanopeurcftwlbymgd graphs.DesignedavlopbyGoogle,TensorFlowrepsnthcmda computainsflwgrh.Eachnodeitsgrapbml operat.Anedgcontiwsrphmul-dimensoalthfw betwnhods . OneofthprimaydvngsTensorFlowisthauprCPUandGPUaswel mobiledvcs,therbymakinglosfdvpwc devicarhtu.TensorFlowashverybigcmuntfdp hugemontbidsfrawk .
[ 47 ]
Caffe Cafewasdignvlopt Lab.Itwasdeignthxpro,sped,andmoulrity.Ithasnexpriv
Berkeley Artificial Intelligence Research(BAIR)
architeuslowfvyngbdmpz parmetswihouncgydl.Thisconfguratlw easywitchfrom CPU toGPUmodeanvic-versawithbunglfc. Cafealsobtgdprmnchkuwi.For instace,onasigleNVIDIAK40GPU,Cafecanprosv60milonagesprdy. Cafealsohtrngcmuiy,rangifomcdeshwlut resachlbuingCafeacroshtgnuplik.
MXNet MXNetisamult-languemchirby.Itoferstwmdcpuain: Imperative mode:ThismodexpantrfcuhlkgNumPylike API.Forexampl,tocnsruaefzbhCPUandGPUusing MXNet,yousethflwingcdbk: JNQPSUNYOFUBTNY UFOTPS@DQVNYOE[FSPT
DUYNYDQV UFOTPS@HQVNYOE[FSPT
DUYNYHQV
Inthexamplri,MXNetspecifhloanwrd eithrnCPUorinaGPUdevicatlon MXNetishalcompunezydf.This alowsMXNetoachievnrdbluzf,unlikeayothr framewok.
.Oneimportandscwh
[ 48 ]
Symbolic mode:ThismodexpacutngrhlkTensorFlow. ThoughteimpravAPIisqutefl,onefitsdrawbckgy.All computainsedbkwfrhlg-definat struce.SymbolicAPIaimstorevhlnbywgMXNeto workithsymblvaendfxp.Thesymbolcan thenbcompildrxuasfw fol ws: JNQPSUNYOFUBTNY YNYTZN7BSJBCMF 9SFQSFTFOUBTZNCPM ZNYTZN7BSJBCMF : [ Y Z N[
Torch TorchisaLuabsedplrnigfmwokvyRonaColobert,Clemnt Farabet,andKorayKavukcogl.ItwasintlyuedbhCILVRLabatNewYork Universty.TheTorchispowedbyC/C++libraesundtho Unified Device Architecture(CUDA)foritsGPUinteracos.Itaimstobehfdp learnigfmwokhspvdC-likentrfacopdl devlopmnt .
Theano TheanoisaPythonlibraywsudef,optimze,andevlutmhic exprsionvlgmut-dimensoalryfct.Someofthkyaurs TheanoistvryghwNumPy,makingtlosverub ofPythondevlprs.ItalsoprvideyntufcgGPUorCPU.Ithas an
,alowingtprvdesfuch ormanyiputs.Itisalonumercytbdhgp leadingtofsrxpvu.Theanoisgdfrmwkchyuv advncemhilrgxptsokfw-levAPIforaine-grained contrlfyudepaig .
[ 49 ]
Compute
Microsoft Cognitive Toolkit Microsoft Cognitive Toolkitisaloknw setofdplarnigmwk.CNTKhastwomjrfuncilep:
CNTK;itshelanryocg
Suportfmulieasch: CPU/GPUfortaingdpec BothWindowsaLinuxoperatigsym Efficentruwokaghbqs Dataprleizonusg-bitquanzed decomposition(SVD) Efficentmodularzhsp: Computenwork Executiong Learniglothms Modelconfigurat
Keras Kerasisadeplrngfmwokthbyv framewokdscibpvuly.Mostofhedcribamwkl-levmodus thadireclynwGPUusingCUDA. Keras,ontherad,couldbenrstam-framewokthincs otherfamwksucTheanorTensorFlowthandeisGPUinteracosh sytem-levacsmngt.Assuch,itshglyfexbandvru-friendly, alowingdevprstchfmyu.Keras comunityspralgde,asofSeptembr2017, TensorFlowteampnsigrKerasubetofhTensorFlowprject.
[ 50 ]
singular value
Framework comparison Thoughanmberfdplistwkx,itshardoune featurpiy.Thetabl : ytirape u f h w outlinesachfrmwk
Languages TensorFlow Python
Excelnt
Excelnt
Cafe
C++
Community Modeling Easy Speed Support Flexibility configuration
Excelnt
Strong Excelnt
Strong
Excelnt
Strong
Strong
MXNet
R,Python, Julia,Scal
Excelnt
Torch
Lua, Python
Strong
Excelnt
Theano
Py, noht C++
Strong
Strong
CNTK Keras
C++ Python
Excelnt Excelnt
Excelnt
Strong
Strong
Strong Excelnt
GPU Tutorials Parallelization
Strong
Strong
Excelnt
Excelnt Strong Good
Good Strong
Good Excelnt
Excelnt Strong
Strong Strong
Strong
Strong
Good
Good
Excelnt Strong
Strong
Recently,ShaohuiShiandtherco-authorsinep(
IUUQTBSYJWPSHQEG
QEG)alsoprentdcmhivfbkgu
framewokspcding:Cafe,CNTK,TensorFlowandTorch.Theyfirstbnchmak theprfomancswkuly networks fully connected neural network(FCN),CNN,andrecutlwok (RNN).TheyalsobnchmrkpftwuiGPUs aswelCPUs.
[ 51 ]
Intheirpa,theyoulincmparvfs.Their exprimntalsudohfwkczGPUsveryficntl andshowperfmcgivCPUs.Howevr,therislnocawmg ofthem,whicsugetralmpovnbdf framewoks .
Setting up deep learning on AWS Inthisecon,weilshotdfrnaygupm Amazon Web Services(AWS).
Setup from scratch Inthisecon,weilustrahopdngvmAWSEC2 GPUinstaceg2.2xlargeuniUbuntServe16.04LTS.Forthisexampl,weilusa pre-baked Amazon Machine Image(AMI)whicalredysnumboft packgesintld makingtesroupd-endplarigsytm.Wewil useapblicyvAMIImagemi-b03ffed,whicasfolngpre-instaled packges : CUDA8.0 Anacond4.20withPython3.0 Keras/Theano
[ 52 ]
1.ThefirstponguhymaAWSacountdspiew EC2GPUinstaceughAWSwebconsla( DPN)shownifgure
IUUQDPOTPMFBXTBNB[PO
:
[ 53 ]
2.Wepicka
g2.2xlargeinstaceypfromhxgwu :
[ 54 ]
3.Afterading aclusterndigEC2keypairthcnlowus usingtheprovdkyafl:
GBofstragehwniu
,wenolauch TTIandlogithebx
4.OncethEC2boxislaunched,nextspioalrvfwckg. ToensurpoGPUutilzaon,itsmporaneughcdv instaledfr.WewilupgradenstNVIDIAdrivesafolw:
WhileNVIDIAdrivesnuthaoGPUcanowbeutilzdyprg aplicton,itdoesnprvayfcl progaminthedvc .
[ 55 ]
Variousdfentwalbxyhpcvk. Computing Language(OpenCL)andCUDAaremocnlyusdit.Inthis bok,weusCUDAasnplictorgmefNVIDIA graphicsdve.ToinstalCUDAdrive,wefirstSSHintoheEC2instacedowl CUDA8.0tour )0.&folderanistmh:
Open
Oncethinsalofd,youcanrtheflwigmdv instalo:
NowyourEC2boxisfulycngredtapvm. Howevr,forsmenwhitvyaldpg, buildngaeprsytmfochk . Toeasthidvlopmn,anumberofdvcpligstwkx, suchaKerasndTheano.BothofesramwkbdnPythondevlpm enviromt,hencwfirstalPythondisrbuex,suchaAnacond:
Finaly,KerasndTheanosritldugPython
Onceth learnigdvopmt.
spackgemnr
QJQinstalocmpedufy,theboxisnwfulyprad
Setup using Docker Theprviousctndbgafmhwky givencotushafwrpkdlb.Onewayto avoidepnclksturhgyDocker.
[ 56 ]
QJQ:
Inthiscaper,weilusthofcaNVIDIA-Dockerimagthosp-packged withalencsrypkgdfmouq withdeplarngcovm:
1.WenowistalDockerComunityEditonasflw:
2.WethenisalNVIDIA-Dockeranditsplug:
3.Tovalidtefhnsopcry,weusthfolingcmad:
4.Onceit image:
setupcorly,wecanusthofilTensorFlowrTheanoDocker
5.WecanrusimplePythonprgamcekifTensorFlowrkspey:
[ 57 ]
YousholdetTensorFlowutpnhescraifg
Summary Inthiscaper,wehavsumrizdkyconptqglworldimpentafgsy.Wedescribonptfmla algebrthcnoudsifpy.We provideahwgutlnbycsf GPU-b ased implentaodwhsrgcfv.Weoutline alistofmpurdengwkhxyv featur-levparityswfomncbhk.Finaly,wedmonstra howtseupacld-basedplrnigctoAWS. Inthenxcapr,weilntroducaksf-starmodule understaighml.
[ 58 ]
Getting Started with Neural Networks Inthiscaper,weilbfocusngthark,includgpt/outp layers,hidenlayrs,andhowterkslugfbcpi. Wewilstarhendmuypcok,talkbouheirdng blocks,andilustrehowyp-by-step.Wewilasontrducef,poular stand r mo el uch Convolutional Neural Networks(CNN),Restricted Boltzmann Machines(RBM),and recurrent neural network(RNN)aswelitvron Long ShortTerm Memory(LSTM).Wewiloutnehky,critalompnesfhu aplictonfhemds,andexplisomrtchyugb understaigofwhyklc.Inaditon theoricalndu,weilasohxmpcdntugTensorFlown howtcnsrulayedivf,andhowtceifrlys.Inthe end,weildmonstra-to-endxamplofMNISTclasiftonugTensorFlow. Withesupyolarndfm $IBQUFS, it stime wejumpintosralxghdy. Theoutlinfhscaprw: Multiayerpcons: Theinputlayrs Theoutplayrs Hidenlayrs Activaonfucs
Howanetorkls Deeplarnigmods: ConvolutiaNeuralNetworks RestricedBoltzmanMachines RNN/LSTM MNISThands-onclasiftexmp
Multilayer perceptrons Themultiayrpconsfhwk.Essentialy,itsdefna havingoeputlyr,oneutplayr,andfewhilyrs(morethan).Each layerhsmutipnodjcf.Eachneuroab thougfascelinwrk.Itdetrminshflowa incomgsal.Signalsfromthepviuydw nextlayrhougcdwis.Foreachtiflnuo,itcalueswghd sumofalincgptbyhewd . Thewightdsumlnorafc whetrisouldbfn,whicresultnopgafxv.
activation functiontodeci
Forexampl,afuly-conetd,fed-forwadneultkispchg diagrm.Asyoumantice,therisancpodly( lineartyofhwksmcbudpv .
and
Thearchituofslynd,thefd-forwadneultksiy likethfowng:
[ 60 ]
).Theno-
The input layer Theinputlayrsofdw.Fortexda,thiscanbeword chartes.Foranimge,thiscanberwpxlvufomd.Also, withvaryngdmesofpu,itformsdenuc,suchaonedimensoalvctr-likestruc.
The output layer Theoutplayrisbchvfnwkdmg theproblmsing.Inunspervidlag,suchaenodigr,theoup canbethsmipu.Forclasiftonpbem,theouplayrcnv neurosf beingachls.Overal,theouplayrmsgcndi itwouldchangery,basedonyurplmtig.
-wayclsiftonduzemxphrb
Hidden layers Hidenlayrsbtwhipuo.Neuronshidelay cantkevriousfm,suchamxpolingyer,convlutiayer,andso,albe perfomingdtahclus.Ifyouthinkferwas pieofmathclrns,theidnlayrscfom compsedtghrayuin.Wewilntroducem varitonsfhedlywkbuc RNN inlaterscofhp .
Activation functions Theactivonfuhrldswmg reachdtsolnupigfxv.Itiscrualtoeph rightacvonfubesd,whiceltakbou later .
[ 61 ]
Anotherimpanfucvsldb.The networklasfmhcudpy.Adiferntabl activonfusedprmbkgzwhl backwrdsintheompugf(los)withrespcog,and thenopimzwgsacrdly,usingradetcoyhpmz techniquord . Thefolwingtabscmvu.Wewildventohmab depr,talkbouhedifrncswm,andexplihowtcsrg activonfu :
[ 62 ]
Sigmoid or logistic function Asigmodfuncthave realinputv.Itsrangeibtw0and1.Itisanctvofuhelwg form:
shapenditfrblucoy
Itsfirtdeva,whicsuedrngbakpotf,haste folwingrm:
Theimplntaosfw: EFGTJHNPJE Y SFUVSOUGEJW UGDPOTUBOU UGBEE UGDPOTUBOU UGFYQ UGOFH Y
TJHNPJEfunctiosalw:
Thedrivatof
EFGTJHNPJEQSJNF Y SFUVSOUGNVMUJQMZ TJHNPJE YUGTVCUSBDU UGDPOTUBOU TJHNPJE Y
Howevr,a gradient.Itisaloknwthvecrg.Therfo,inpractluse,itsno recom nd t us a
TJHNPJEfunctioasehgrdvpblm TJHNPJEasthecivonfu,ReLUhasbecomrpul.
Tanh or hyperbolic tangent function Themathiclforunsw:
Itsoutpicenrdazwhgf-1to1.Therfo,optimzanserd thusinprace,itsprefdovagmcnu.Howevr,itslufer fromthevanisgdpbl .
[ 63 ]
ReLU The Rectified Linear Unit (ReLU)hasbecomquitplrny.Its matheiclforusw:
Comparedtosignh,itscompuanhlerdf.Itwas provedthaimscngbyx(forexampl,afctor6inKrizhevsky andit'sco-authorsinewkf ,2012),posiblyduethfacnr-saturingfom.Also, unliketahorsgmdfcwvxp,ReLU canbehivdysmpltrogz.Therfo,ithasbecomvry poularvethscfy.AlmostaldeprniguReLUnowadys. AnotherimpandvgfReLUisthavodrecfng problem . Itslimtaonredhfcupby.Itcanot beusdinthoplayr,butonlyihedars.Therfo,forclasitn problems,onedstuhfmaxcilyrp probailtesfc.Foraegsionpblm,oneshuldimpyarfct. AnotherpblmwiReLUisthacnuedropblm.Forexampl,if largedintsfowhuReLU,itmaycusehwgobpd neurowilvbactyhfdps . Tofixthsproblem,anothermdifcwsul problemfdyingustcakhv.
Leaky ReLU.Tofixthe
Leaky ReLU and maxout ALeakyReLUwilhavesmop alsobemdintprfchu,suchainPReLUneuros(Pstandfor parmetic).Theproblmwithsacvnfuy efctivnsouhmdarpbl .
onthegaivsd,sucha0.01.Theslop
MaxoutisanhermplvdbReLU.Ittakeshform specialofthrm,thais,forReLU,it's lineartydhvgosu,ithasdoublenmrfpvy singleuro .
.Fromthisfr,wecansthboReLUandlekyReLUarejust .Althougibenfsrm
[ 64 ]
can
Softmax WhenusigReLUasthecivonfurlpbmx isuedonthlayr.Ithelpstognrabiyucm( )foreachtls:
Choosing the right activation function Inmostcae,weshouldaycnirReLUfirst.ButkepinmdhaReLUshould onlybeapidthrs.Ifyourmdelsfan,thenikabou adjustingyorle,ortyLeakyReLUormaxut. Itisnotrecmduhgayfv gradientpoblmscvyw.Takesigmodfrxapl.Itsderivat greathn 0.25 evrywh,makingtersdubcpovl.While forReLU,itsdervaonypbz,thuscreaingmobl network . Nowyouhavegindbscklftmpr,let's moventudrsaighwklf.
How a network learns Suposewhavt-layerntwok.Let twolayersb,thais,theconiwgsbavlu: Wewilasoue
srepntiu/outpswih and
. asthecivonfu.
Weight initialization Afterhconfiguawk,traingswhlze'values. Aproewightnalzsmdju thecofinsbapurmdly aproximtnfhegvlu.Inmostcae,weightsarnlzdomy.Insome finely-tunedsig,weightsarnlzdup-trainedmol.
[ 65 ]
,andthe
Forward propagation Forwadpogtinsbclyuhemk weightplusof,andthegoirucvfxly:
AnexamplcodbkusingTensorFlowcanberitsf: EJNFOTJPOWBSJBCMFT EJN@JO EJN@NJEEMF EJN@PVU EFDMBSFOFUXPSLWBSJBCMFT B@UGQMBDFIPMEFS UGGMPBU ZUGQMBDFIPMEFS UGGMPBU X@UG7BSJBCMF UGSBOEPN@OPSNBM C@UG7BSJBCMF UGSBOEPN@OPSNBM X@UG7BSJBCMF UGSBOEPN@OPSNBM C@UG7BSJBCMF UGSBOEPN@OPSNBM CVJMEUIFOFUXPSLTUSVDUVSF [@UGBEE UGNBUNVM B@X@C@ B@TJHNPJE [@ [@UGBEE UGNBUNVM B@X@C@ B@TJHNPJE [@
Backpropagation relationshpbwk
Allthenworksafmdupig/parmetso reflcthosbadngivu.Thegradintshlop sweightandro.
[ 66 ]
Calculating errors Thefirsthngbackpolumwd yourtagevl.Theinputrovds wecomputhflingvr: asteforhcuynwk
soutp,so
Thiswrtencodafl: EFGJOFFSSPSXIJDIJTUIFEJGGFSFODFCFUXFFOUIFBDUJWBUJPOGVODUJPO PVUQVUGSPNUIFMBTUMBZFSBOEUIFMBCFM FSSPSUGTVC B@Z
Backpropagation Witheros,backprogtinwsdueh directonfgash.First,wendtocmpuhlasfig biase.Noteha
isuedtopa and
,and
isuedtopa and
:
ThiswrtenTensorFlowcdeasf: E@[@UGNVMUJQMZ FSSPSTJHNPJEQSJNF [@ E@C@E@[@ E@X@UGNBUNVM UGUSBOTQPTF B@E@[@ E@B@UGNBUNVM E@[@UGUSBOTQPTF X@ E@[@UGNVMUJQMZ E@B@TJHNPJEQSJNF [@ E@C@E@[@ E@X@UGNBUNVM UGUSBOTQPTF B@E@[@
[ 67 ]
Updating the network Nowthedlasvbncompu,it case,weusatypofgrdinc.Let repsnthlaig,theparm updateformlis :
stimeoupdahnwrk
sparmet.Inmost
ThiswrtenTensorFlowcdeasf: FUBUGDPOTUBOU TUFQ< UGBTTJHO X@ UGTVCUSBDU X@UGNVMUJQMZ FUBE@X@ UGBTTJHO C@ UGTVCUSBDU C@UGNVMUJQMZ FUB UGSFEVDF@NFBO E@C@BYJT UGBTTJHO X@ UGTVCUSBDU X@UGNVMUJQMZ FUBE@X@ UGBTTJHO C@ UGTVCUSBDU C@UGNVMUJQMZ FUB UGSFEVDF@NFBO E@C@BYJT >
Automatic differentiation TensorFlowprvidesaycntAPIthacnelpusodiryv anduptehworkms: %FGJOFUIFDPTUBTUIFTRVBSFPGUIFFSSPST DPTUUGTRVBSF FSSPS 5IF(SBEJFOU%FTDFOU0QUJNJ[FSXJMMEPUIFIFBWZMJGUJOH MFBSOJOH@SBUF PQUJNJ[FSUGUSBJO(SBEJFOU%FTDFOU0QUJNJ[FS MFBSOJOH@SBUFNJOJNJ[F DPTU %FGJOFUIFGVODUJPOXFXBOUUPBQQSPYJNBUF EFGMJOFBS@GVO Y ZY Y SFUVSOZSFTIBQF ZTIBQF
[ 68 ]
0UIFSWBSJBCMFTEVSJOHMFBSOJOH USBJO@CBUDI@TJ[F UFTU@CBUDI@TJ[F /PSNBM5FOTPS'MPXJOJUJBMJ[FWBMVFTDSFBUFBTFTTJPOBOESVOUIFNPEFM TFTTUG4FTTJPO TFTTSVO UGJOJUJBMJ[F@BMM@WBSJBCMFT GPSJJOSBOHF Y@WBMVFOQSBOEPNSBOE USBJO@CBUDI@TJ[F Z@WBMVFMJOFBS@GVO Y@WBMVF TFTTSVO PQUJNJ[FSGFFE@EJDU\B@Y@WBMVFZZ@WBMVF^ JGJ UFTU@YOQSBOEPNSBOE UFTU@CBUDI@TJ[F SFT@WBMTFTTSVO SFTGFFE@EJDU \B@UFTU@YZMJOFBS@GVO UFTU@Y^ QSJOUSFT@WBM
Inaditonhsbceg,let encoutripa.
snowtalkbufeimprcygh
Vanishing and exploding gradients Thesarvyimpotnudlwk.Thedprthaciu, themorlikysuf.Whatishpengdur backprogtinse,weightsardjunpovl.Sowemay havetwodifrnsc : Ifthegradinsoml,theniscald makesthlrnigpocvywud.For exampl,usingmodathecvf,wheritsdva alwysmerthn0.25,afterwlysobckpgin,thelowrays wilhardyecvnusfgomt,thusenworki updateroly. Ifthegradinsolcuv,this cal ed bounderthlaigs.
problem.It
.Thisoftenhapwcvu
[ 69 ]
Optimization algorithms Optimzaonshekywrl.Learnigsbclyoptmz proces.Itrefstohpcaminz,cost,orfindsthelcua eros.Itthenadjusworkcfipby.Averybasicoptmzn aprochistenwudv multipevaronshdjbwf.TensorFlow providesmultnfychaz,for exampl ,
.Howevr,thera
,and
.FortheAPIandhowtusem,pleasthi
: egap IUUQTXXXUFOTPSGMPXPSHWFSTJPOTNBTUFSBQJ@EPDTQZUIPOUGUSBJOPQUJNJ[FST.
Thesoptimzrhuldbfcnagq.Ifyouaren whiconetus,use
tsure astringpo.
Regularization Likealothrmcingps,overfitngsmhadb contrledahim,espcialygvnthworkm. Oneofthmdsalwivrngc isdonebyagmctrhp,suchaL1orL2regulaizton, whicprevntgsofkb.TakeL2 regulaiztonsxmp.Itisachevdbyugmntofw squaredmgnitoflwhk.Whatidoeshvlypnz thepakywigvcorsndfu . Thatis,wencouragthkslfipyb difusngthewvcormly.Overlyagwihtsmnokdp tomuchnafewvilygdps,whicanmketdfulogrz dat.Duringtheadsc,L2regulaiztonsycv weightodcayzr,andthiscle
regularization.Typicalreguzton
weight decay.
AnothercmnypfgulaizsL1regulaizton,whicoftenlads weightvcorsbmnpa.Ithelpstoundrawicfm forpedictnsbyhagmz.Itmayhelptnworkb resitanohpu,butempircalyL2regulaiztonpfmsb.
[ 70 ]
Max-normisathewyfcgblup-boundthemagif incomgwehtvrfyu.Thatis,duringtheascp,we normalizethvcbkwdus
if
.Thiscaled
projected gradient
descent.Thisometablzhrngfwkcv growtbi(alwaysbounde)evnifthlargso. Droputisaverydfnkmhg togehrwicnquspvlymd.Duringta,droputis achievdbyonlkpgrtfusw zero.Apre-sethypram isuedtognramplfwhc shouldbetzr(dropedut).
isoftenudprac.Intuively,droput
makesthdifrnpowly networkisupdagchb.Overal,itprevnsofgbydaw ofaprximtelycbngduwkhs eficntly.Formoedtails,onecarftHinto
sdroputae( ).
Deep learning models Inthisecon,weildvntohrpuagmsby:CNNs, Restricted Boltzmann Machines(RBM),andthe
recurrent neural network(RNN).
Convolutional Neural Networks ConvolutiaNeuralNetworksaebilgcy-inspredvatofhmuly perctonadhvbyfisumg andclsifto.ConvNetshavbnucflypidwg,objects, andtrficsgwelpovb-drivngcas.CNNsexploit spatily-locaretinbyfgvpwus adjcentlyrs.Inotherwds,theinpusofdlayr ofunitslayer ,unitshaveplycogrfd.
arefomsubt ,
LeNetwasonfhvryiclukpdbYanLeCuni 1988.Itwasminlyuedforchtgkzp,digts, andso.In2012,AlexKrizhevskyandHintowheImageNetcompinwha astoundigmprve,dropingclasftem26%to15%usingCNN,whic starednofvilpg .
[ 71 ]
TherafwundmtlbigcopsCNN: Convolutiayer(CONV) Activaonlyer(nonliearty,forexampl,ReLU) Poolingrsub-samplingyer(POOL) FulyConectdlayr(FC,usingoftmax) ThemostcnfraConvNetisoacknfwprCONV-ReLUlayers,each folwedbyaPOOLlayer.Thispaternlhumgb agretdnsfompilych.Then,athelsyr,itrans intoafuly-conetdlayr,whicoftenulzsmaxprb,espcialy ift samulti-wayclsiftonprbem,ashowniteflgur:
Convolution Convolutiesafwcp,suchaonvle,stride,andpig. Fortwo-dimensoalg,convlehapsfrt.Suposeyu havewigtmrxnd(shownasvluetcpix)ashownite folwingure . Thewightmarx(oftencald kernlovthimagbcds.Ifthewig matrixoves1pixelatm,itscaled (pixelvaus)fromtheignalupdbywx thaiscurenlygdpo .
kernelor
filter)isapledtohmgbycn strideof1.Ateachplmnt,thenumbrs
[ 72 ]
Thesumofalthprdcivbykn'snormalize.Thersultipacdno thenwimagposrxcd.Thekrnlistad tohenxpilsadrcumgvb . Aswecansfromthligu,astrideof2wouldrestinhfg:
[ 73 ]
Sowithagersdnumb,theimagsrnkvyqucl.Tokepthorignalszf theimag,wecand0s(rowsandclum)tohebrdfimag.Thiscaledth .Thelargthsid,thelargpdinwvob:
Pooling/subsampling Thepolingayrsvducthzf numberofpatsdcihwk.Forcolimages,polingsde indeptlyoachr.Themostcnfrpligay aplieds
.Theralsothypfingu,suchavergpolin orL2-normpling.Youmayfindsoerltwkgvp.As typicalshowberfmn,avergpolinhsctyf outfavr.Itshouldbentarywvifmxpgc senipract:3x3withsrde=2(alsocaed comnly,2x2polingwthsrde=2.Poolingszewtharcpvfd destruciv .
overlapping pooling),andevmor
[ 74 ]
Thefolwingurstahmxpc:
Fully connected layer Neuronsiaflycetdhvp layer,whicsdferntoCONVlayers.InCONVlayers,neurosactdly locareginthpu,andmyoftheursiCONVvolumesharpt. Fulyconetdarisfuhwmxp theoracivnfuspbl .
Overall Acirleoftangudswpbk: Foreachinputmg,wefirstpahougcnvly.The convledrsutafih(thatis,CONV+ReLU). Theobtaindcvmpshgryxlfu,tha is,POOL.Thepolingwrsutamzfhcd thenumbrofas. CONV(+ReLU)andPOOLlayerswibptdfmoh conetdhfulyars.Thisncreathdpofwk whicnreastpblyofmdgx.Also,diferntlvso filtersanhd'shieraclpntodfv.Pleasrf ot $IBQUFS, repsntaiolg. Theoutplayrisfncd,butwihsofmaxncelp computehrbaily-likeoutp. formedtailsbupnwk
[ 75 ]
Theoutpishncmardwgvl whicaretnusdomp.Usualy,thelosfuncida themansqurdoipzg . Errosaethnbckpgdufivl. Totakedp-diventoCovNetapliconsfrmuv,pleasrfto ,
$IBQUFS
fordetails.
Restricted Boltzmann Machines ARBMisaneurltwokhy:thevisblayrnd. Eachvisblenod/neuroisctdah.Restriconmeah intra-layercomunit,thais,theranocisbwvl-visble nodesrthi-hidenos.Itwasonefthrlimdbuc areofticlngdhsbpuym dimensoaltyruc,clasifton,featurlnig,andomlyetci. Thefolwingurshtbac:
It srelativyoxpRBMinamtheclforsjuw parmets: Thewightmarx ( visbleandtho.Eachentry isthewgofcn betwnvislod Twobiasvectrfhlyndp ( elmnt corespndthbiavluf corespndthbiavlufy,witheaclmn corespndigth
)describthongw andhi e o
.
), thvisblenod.Similary,vector thnode.
[ 76 ]
Comparedtocnulwks,therasomnicbldf: RBMisagenrtv,stochaineurlwk.Theparmtsdjuo learnpobitydsuvhf. RBMisanergy-basedmol.Thenrgyfuctiopdsalv whicbaslyorepndtfgu modelbingthacfur. Itencodsutpibarym,notasprbile. Neuralnetwoksypfmighdbc,butRBMs use contrastive divergence(CD).WewiltakbouCDinmoredtalh folwingsect.
Energy function RBMisanergy-basedmol.Thenrgyfuctiopdsalvh probailtyfhemdsncgu. FromGeofreyHinto'stuorial( ),thenrgyfucioswal:
Thecalutionsmp.Basicaly,youdtheprcbwnias corespndigut(visbleorhdn)tocaluehirnbgy functio.Thethirdmsngypaofcbwvl nodesa th i . Duringthemodlasybz,thais,modelparts( and , )beingupdathrcoflwy.,
Encoding and decoding ThetraingofRBMcanbethougfswp,theforwadncigp (construci)andthebckwroig(reconstrui).Inanuspervidtg, wherliktoansmdbufp,theforwadn backwrdpseonfl .
[ 77 ]
Inaforwdps,therawinpuvlfomd(forexampl,pixelvausfromn image)arepsntdbyhvilo.Thenthyarmulipdwgs andewithbsvlu(note,thevisblaunodfrw pas).Thersultingvapdhocfb outp.Iftherafolwingyscd,thisacvonreulwbdp movef rwa d:
InoursimpleRBMcase,whicasonlyedvbr,theacivon valuesofthidnyrbcmpkw.Theyarmultipdb thewigmarx,througedsfwi,andpoultebckwrsh visblenod.Ateachvisblnod,altheincomgvusrdp thevisblau(note,theidnbasvluockwrp):
[ 78 ]
SincethwigsofRBMarendomizthbg,inthefrswoud reconstui,whicsomputedbyrnval value,canbelrg.Sousaly,itnedsafwromzuchl erominusachd.Theforwadnthbckpslm jointprbalydsuf andctivoresul(astheoupf y )hidenlayr .ThiswhyRBMisthougfaenrvlm. Thequstionwhpdarkm. First,erosacmputdingKL-divergnc.TolearnmobutKL-divergnc, readscnfto
ofthebk byDavidMacKay( 2018).Basicaly,itcompueshdfrnwbyakgl diferncbtwhsuo.MinimzgKL-divergncmastopuh learndmoistbu(intheformacivluspd layer)towardsheinpub.Inmanydeplrigoths,gradient descntiu,suchatoigrden.Howevr,RBMisungamethodf aproximtedu-likehodarng,caled
IUUQXXXJOGFSFODFPSHVLJUQSOOCPPLQEG,linkastchedJan.
contrastive divergence.
[ 79 ]
Contrastive divergence (CD-k) Contrasivedgcbhouf algorithm.Itcomputeshdivrgn/diferncsbtwhpova(energyof firstencodg)andegtivphs(energyofthlascdi).Itisequvalnto minzgtheKL-divergncbtwhmodlsua(empircal)dat distrbuon.Thevaribl practie,
learnig
isthenumbrofycavdg.In semtowrkupingly.
Basicaly,thegradins phaseocitdgrn,andegtivphsocr.Thepositvand
usingthedfrcbwopa:positve
negativrmsdoflchbu probailtydsune.Thepositvacdgrnhbly oftraingd(byreducingthospf),whiletscondrm decrasthpobilyfmgn.Apseudocnit TensorFlowcanberitsf: %FGJOF(JCCT4BNQMJOHGVODUJPO EFGTBNQMF@QSPC QSPCT SFUVSOUGOOSFMV UGTJHO QSPCTUGSBOEPN@VOJGPSN UGTIBQF QSPCT IJEEFO@QSPCT@TBNQMF@QSPC UGOOTJHNPJE UGNBUNVM 98 IJEEFO@CJBT WJTJCMF@QSPCTTBNQMF@QSPC UGOOTJHNPJE UGNBUNVM IJEEFO@ UGUSBOTQPTF 8 WJTJCMF@CJBT IJEEFO@QSPCT@UGOOTJHNPJE UGNBUNVM WJTJCMF@QSPCT8 IJEEFO@CJBT QPTJUJWFBTTPDJBUFEHSBEJFOUTJODSFBTFTUIFQSPCBCJMJUZPGUSBJOJOHEBUB X@QPTJUJWF@HSBEUGNBUNVM UGUSBOTQPTF 9IJEEFO@QSPCT@ EFDSFBTFTUIFQSPCBCJMJUZPGTBNQMFTHFOFSBUFECZUIFNPEFM X@OFHBUJWF@HSBEUGNBUNVM UGUSBOTQPTF WJTJCMF@QSPCTIJEEFO@QSPCT@ 88 BMQIB X@QPTJUJWF@HSBEX@OFHBUJWF@HSBE WCWC BMQIB UGSFEVDF@NFBO 9WJTJCMF@QSPCT ICIC BMQIB UGSFEVDF@NFBO IJEEFO@QSPCT@IJEEFO@QSPCT@
Inthecodsnipabv, sotheinpu
9isthenpuda.Forexampl,MNISTimageshv784pixels 9isavectorf784entrisadcogly,thevisblayr784nodes.Also
notehaiRBMtheinpudascobry.ForMNISTdat,onecaus-hot encodigtrasfhpuxlv.Inaditon, biasofthevlyr , samplingfucto tourn .
BMQIBisthelarng, 8isthewgmarx.The
ICisthebaofdnlyr,and TBNQMF@QSPCistheGibs-Samplingfuctoadeswh
[ 80 ]
WCisthe
Stacked/continuous RBM Adep-belifntwork(DBN)ismplyafewRBMsstackedonpfhr.The outpfrmhevisRBMbecomsthinpuflwgRBM.In2006,Hinto prosedaft,gredyaloithmnsp: thacnlerdp,directblfnwoksaym.DBNlearnshic repsntaiofudmch,therfoisvyul, espcialynurvdtg .
,
Forcontiusp,onecarfthmdlius Boltzmanchies,whicutlzeadfrnyposvgm. Suchmodelsanwitgpxrvzb andoe .
RBM versus Boltzmann Machines Boltzmann Machines(BMs)canbethougfsprilm-linearMarkov randomfiel,forwhictengyuslapm.Toincreasth repsntaiocyfmldbu,onecasidrth numberofvailsthd,thais,hidenvarbls,orinthscae,the hidenuros.RBMsarebuiltonpfBMs,inwhctersoapldf novisble-visbleandh-hidencots.
Recurrent neural networks (RNN/LSTM) Inaconvlutierwkypfd,theinformalws througaseifmclpndwbk lopranycsidetfhg.Therfo,theyarnocpblf handligputswcomeq . Howevr,inpracte,wehavlotfsqunidcmr dat,whicnludetx,genoms,handwritg,thespoknwrd,ornumeicalts-seri datemnigfros,stockmare,andgovermtcis.Itisnotlyhe ordethams.Thenxtvaluiofgrydpshc (longrsht).Forexampl,topredichnxwaslfm requid,notjusfrmwdeaby,butsomeihfrwdnc. Thishelptoubjcand.
[ 81 ]
The recurrent neural network(RNN)arenwfomticlukd designpcfalyorth.Ittakesinocuhqrd(this sequncabofritylgh)anditerlopswhcu,meanig anycofigurt/staeofhnwrkimpcd,notlybhecuripas bytheircnpas .
Cells in RNN and unrolling Allrecuntawoksbhgfipmd/celsinth dimensoft.Thisrepatngmodul/celansimpybgthr.Onewayto understahiol,orunavel,thearciunomsp,andtrech timespalyr.WecansethdpofRNNisentalydcbh lengthofimsprquc.Thefirstlmnohquc,such asthewordfnc,isequvalntohfry. Thefolwingurshtcm:
Backpropagation through time Infedorwantks, outplayerndhmvigbckws,layerb.Ateachstp,it caluteshpridvofwg .Thenthroug optimzanrch(forexampl,gradientsc),thosedrivauj thewigsupordnca.
backpropagation(BP)starwihclungefo
[ 82 ]
Similarynecutwoks,afterhunolwksgim,BPcanbe thougfasnexivrmd,thiscaled timeor BPTT.Thecalutionsvrym,onlythaesrifb replacdbysiofmnth .
backpropagation through
Vanishing gradient and LTSM Similartodepchus,thedprnwoksg,themorsvanig gradientpoblms.What networkchaglsd.Giventhawork'sweightarndomly, withno-movingwehts,wearlnigvytfomhd.Thiso-caled problemasfctRNN. shapenigtwbof
EachofteimspnRNNcanbethougfslyr.Then,duringbackpot, erosaginfmtphvu.Sothenworkcabugfs beingasdpthumrof.Inmanyprctilobes,suchaword sentc,parghs,ortheim-seridat,thesquncfdioRNNcanbevry long.TherasonthRNNisgodatequnc-relatdpobmsihyg retaingmpofhvusd
contex informatdyhecup.Ifthesquncarvylog,andthegris computedringa/BPTTeithrvans(asareultofmipcn0< values
Thenwcausthfoligdrp: TFTTUG4FTTJPO QSJOU TFTTSVO OPEFTFTTSVO OPEF
Theoutpfhrcdingsalw: TFTTSVO OPEF
Handwritten digits recognition Thechalngofdwritszm handwritegs.Itisueflnmaycro,forexamplcgnizds envlops.Inthisexampl,weilusthMNISTdatseovlpnur neuraltwokmdfhigc .
[ 86 ]
MNISTisacomputervndh: consitfgraylemhdwb.Each images28pixelsby28pixels.Sampleiagsrhownf:
IUUQZBOOMFDVODPNFYECNOJTU.It
TheMNISTdatisplnohre:55,000imagesoftrnd,10,000imagesof tesda,and5,000imagesofvldtn.Eachimagesopndbytl, whicrepsntadg.Thegoalistcfyhmnd,inotherwds, asocitehmgwnfl . Wecanrepsthimgu1x784vectorflaingpumbsw0and 1.Thenumbr784isthenumbrofpxl28x28image.Weobtainhe1x784vector byflatenigh2Dimagento1Dvector.Wecanrepsthlb1x10vectorf binaryvlues,withoneadlymbg1,thersbing0.Wearegoint buildaeprngmosTensorFlowtpredich1x10labevctorginh1 x784datvecor. Wefirstmpoheda: GSPNUFOTPSGMPXFYBNQMFTUVUPSJBMTNOJTUJNQPSUJOQVU@EBUB NOJTUJOQVU@EBUBSFBE@EBUB@TFUT ./*45@EBUB POF@IPU5SVF
WethendfisombaculgkrCNN: Weights: EFGXFJHIU@WBSJBCMF TIBQF JOJUJBMJ[FXFJHIUTXJUIBTNBMMOPJTFGPSTZNNFUSZCSFBLJOH JOJUJBMUGUSVODBUFE@OPSNBM TIBQFTUEEFW SFUVSOUG7BSJBCMF JOJUJBM
Bias: EFGCJBT@WBSJBCMF TIBQF *OJUJBMJ[FUIFCJBTUPCFTMJHIUMZQPTJUJWFUPBWPJEEFBE OFVSPOT JOJUJBMUGDPOTUBOU TIBQFTIBQF SFUVSOUG7BSJBCMF JOJUJBM
[ 87 ]
Convoluti: EFGDPOWE Y8 'JSTUEJNFOTJPOJOYJTCBUDITJ[F SFUVSOUGOODPOWE Y8TUSJEFT QBEEJOH 4".&
Maxpoling: EFGNBY@QPPM@Y Y SFUVSOUGOONBY@QPPM YLTJ[F TUSJEFTQBEEJOH 4".&
Nowebuildthnraokmgcps buildngocks.Ourmodelcnsitfwvuayhpg andfulycoetrh.Thegraphcnbilustdfow:
Thefolwingcdmptshvuark: YUGQMBDFIPMEFS UGGMPBUTIBQF Z@UGQMBDFIPMEFS UGGMPBUTIBQFHSPVOEUSVUIMBCFM 'JSTUDPOWPMVUJPOMBZFS 8@DPOWXFJHIU@WBSJBCMF C@DPOWCJBT@WBSJBCMF GJSTUEJNFOTJPOPGY@JNBHFJTCBUDITJ[F Y@JNBHFUGSFTIBQF Y I@DPOWUGOOSFMV DPOWE Y@JNBHF8@DPOW C@DPOW I@QPPMNBY@QPPM@Y I@DPOW 4FDPOEDPOWPMVUJPOMBZFS 8@DPOWXFJHIU@WBSJBCMF C@DPOWCJBT@WBSJBCMF I@DPOWUGOOSFMV DPOWE I@QPPM8@DPOW C@DPOW I@QPPMNBY@QPPM@Y I@DPOW 'VMMZDPOOFDUFEMBZFS
[ 88 ]
8@GDXFJHIU@WBSJBCMF C@GDCJBT@WBSJBCMF I@QPPM@GMBUUGSFTIBQF I@QPPM I@GDUGOOSFMV UGNBUNVM I@QPPM@GMBU8@GD C@GD
Wecanlsoreduvfitgp: LFFQ@QSPCUGQMBDFIPMEFS UGGMPBU I@GD@ESPQUGOOESPQPVU I@GDLFFQ@QSPC
Wenowbuildtheasyr,theradouly: 8@GDXFJHIU@WBSJBCMF C@GDCJBT@WBSJBCMF 3FBEPVUMBZFS Z@DPOWUGNBUNVM I@GD@ESPQ8@GD C@GD I@GD@ESPQUGOOESPQPVU I@GDLFFQ@QSPC
Nowedfinthcosuargpm: DSPTT@FOUSPQZUGSFEVDF@NFBO
UGOOTPGUNBY@DSPTT@FOUSPQZ@XJUI@MPHJUT MBCFMTZ@MPHJUTZ@DPOW USBJO@TUFQUGUSBJO"EBN0QUJNJ[FS FNJOJNJ[F DSPTT@FOUSPQZ
Next,wedfinvaluto: DPSSFDU@QSFEJDUJPOUGFRVBM UGBSHNBY Z@DPOWUGBSHNBY Z@ BDDVSBDZUGSFEVDF@NFBO UGDBTU DPSSFDU@QSFEJDUJPOUGGMPBU
Lastly,wecanfilyruthgpso: XJUIUG4FTTJPO BTTFTT TFTTSVO UGHMPCBM@WBSJBCMFT@JOJUJBMJ[FS GPSJJOSBOHF CBUDINOJTUUSBJOOFYU@CBUDI JGJ USBJO@BDDVSBDZBDDVSBDZFWBM GFFE@EJDU\ YCBUDIZ@CBUDILFFQ@QSPC^ QSJOU TUFQEUSBJOJOHBDDVSBDZH JUSBJO@BDDVSBDZ USBJO@TUFQSVO GFFE@EJDU\YCBUDIZ@CBUDI LFFQ@QSPC^ QSJOU UFTUBDDVSBDZH BDDVSBDZFWBM
GFFE@EJDU\ YNOJTUUFTUJNBHFT Z@NOJTUUFTUMBCFMT LFFQ@QSPC^
[ 89 ]
Atthend,weachiv99.2%acuryonthesdfiMNISTdatseuing simpleCNN.
Summary Inthiscaper,westardihbcmulyponk.Fromther,we havetlkdbousicr,suchateinp/outplayerswvi typesof
.Wehavelsogindtpwrk withefocusnbakprgdm.Withes fundametlsi,weintroduchypsflak:CNN,Restriced Boltzmanchies,andrecutlwoks(withsvaron,LSTM).Foreach particulnewoky,wegavdtilxpnosfrhkybuc architeu.Atthend,wegavhnds-onexamplsiutrfgTensorFlow foraned-to-endaplicto.Inthenxcapr,weiltakboupcnsfr networksicmpuv,includgpoaretwkhs,bestpraci,and realwokxmps .
[ 90 ]
Deep Learning in Computer Vision Intheprviousca,wecovrdthbasifnulk apliedforsvngc
artificial intelligence(AI)task.Asoutlinedhcapr,one ofthemspulardnigbyc visonacluterwk,alsoknwCNN.Thischaptermov CNNsinmoredtal.Wewilgovercnptshakf CNN,andhowteycbuslvr-worldcmputevisnb.Wewil specifalynwrthogqu : HowdiCNNsorignatedwhsclf? WhatcorenpsfmhbiudgCNNs? WhatresomfhpulCNNarchiteusnody? HowdoyuimplentbascfCNNusingTensorFlow? Howdoyufine-tuneapr-trainedCNNanduseityorplc?
Origins of CNNs WalterPitsandWarenMcCulochareftndiwsmp 1943,whicasnpredbytulok-basedtrucofhmni.They prosedatchniquflg-basedignprov formalisundewhctvFiniteAutoma. The McCulloch-Pittsnetworkasdicgphu edgswrmakithxcoy(1)orinhbty(0),anduse replicathumnogs .
ot
Oneofthcalngsidwr,aswouldbe definlatr.HenryJ.Keleyprovidthfsnagm formacntius ArthurBryson.The orignalbckptmde.Thoughtemdlanriws earlyon,theirnfcyldoapbsmu.
backpropagation modelin1960folwedbyanimprvt chain rulewasdevlopbyStuartDreyfusaimplctonh
ThearlistwokngmpfdhcIvakhneo andLapain1965.Theyusdmolwithpnacvf,whicer furthesaiclynzd.Fromeachlyr,theyslcdaibfurn forwadeithnxly,whicasoftenldmupr.Intheir1971 paer,theyalsodcribpnuwkm
alpha,whicadegt
ayerstindbhl group method of data handlingalgorithm.Howevr,noefths sytemwapriculdfohnvk.Thearlist inspratofhlewkcmHubelandWiesl[5,6]inthe1950sand1960s. Theyshowdtviualcrxnbmkp indvualytosmreghf,alsoknwthe Oneofthkybsrvainmdwgcl overlapingctfds,andsuchreptivflox. Theyalsodicvrthnuxmpf
receptive field
and
.Simplecsrondtaighvfw theircpvflds.Complexcs,ontherad,arefomduthpjcin variousmplec.Thoughteyrspndamif corespndigml,theyingraspoflmcvwd recptivfld.Thiscaueomplxtbrn-invartoseh exactloinfhdgsrpv.Thisoneftharcupl behindtsgamplofCNNsinpracteody. Thefirstal-worldsyteminpbhkfHubelandWieslwa neocognitron, devlopbyKunihkoFukushima.Neocgnitrsfedah firstCNNimplentaohrwd.Neocgnitr'sprimayfocutlen handwritegsfomz.Inthisparculdegn,neocgitrsdf nielayrswhcotdfgup:slayerofimpc, andlyerofcmpxs , C-cells.Eachlayerisfutdvnogmb planes,howevracplnitysmubf,alsorefd toas processing elements. Eachlayersdifntumbop complexs.Forexampl,U S1 has12planesof
S-cells,
19 x 19simplec.
[ 92 ]
Thecortlasifnpudgmbyhw therigmosC-celayr.Thesarlymodpvingtf matheiclodngfvsurx.Howevr,learnigthpmsofuc modelswaticnrfuk.Itwasnotuil1985whenRumelhart,Wiliams, andHintoapliedbckrg-basedtchniquolrwk.They showedtauingbckpr,neuraltwokscigdb repsntaio.ThesucofbakprgtindwhYanLeCun demonstrahpcilfbkgw recognitfmbakhs.Thesadvncmolrputiw thanwsvilbedurgm.Howevr,notlgafer,thedvlopmnf modern-day graphics processing units(GPU)inthearly2000sincreasdomput spedby1000times.Thispavedthwyforl-worldapictnfeCNN modelsnargtfiywk.
Convolutional Neural Networks Welearndfomthpviuscwk,whic havewigtsndblro.Thisnetworkgazd layerswhciompdfnubt.Neuronsieach layercontdusihxgfw thaislerndfomg.Eachneuroalsp-selctdaivon functio.Forevyinputcs,aneurocmptsidwhl weightandpsroucvf . Thoughtisarcewklfm-scaledt,ithascleng:
[ 93 ]
Imagineyourtmcshlwk.The inputmagesr32x32x3,meanigthyvrcols,red,gren,andblue (RGB),andechlimgs32pixelswdan32pixelshg.Ifyouwertinp thismagendfulycorx,eachnurowil have32x32x3=3072edgsorwiht,ashownite figure.Tolearngodwihtpsvc,youwldne lotsfdancmpuewr.Thiscalehngwropymfu thesizofanmgr32x32to,forexampl,200x200.Notonlywihspecmu chalengs,thisparmexlonwvbyd comnahielrgptf.
overfitting,whicsavery
CNNsarespciflydgntovmhb.CNNsworkelifthnpus themavypiclgrd-likestruc,asfoundimge.Unlikerguantwos, CNNsorganizethpud-dimensoaltr-likestruc eprsntig onelayrisctdpvghumfx. Thisenurthamboflyc,eachnuroslift locatin(figure toreduchigmnsalpvf:
width,height,and
depth.Toprevntamxlosi,eachvolumin
).Finaly,theouplayrisb
Oneofthmsiprand,afterwhicCNNisnamed,istheconvlupra. Aconvlutiperabdshfw-valuedsign. Todescribtanxmpl,let'sayouhvelkwitcrmf.You decitopkuarfmhsnw.Whenthisrock surfaceothw,itcreasplohkgnfm oftherckwia'surface.Intermsofcnvlui,youcanexplithsr efctashrulonvipkw.Theprocsf convlutimeasrhpfgwd . Oneofitsprmaylcndg.
[ 94 ]
Onesuchxamplinvrgftoyd.Sometis, whentimagscpurdvy,youmightwandblrefc, alsoknwthe
averaging filter.Convolutisfehmcy-usedtol achievtsf.Asshownite left ( input data),whencovlditamrx( alsoknw feature map:
figure,themarixon kernel)ontherig,genratsoup
Mathemaicly,convlutiabedfsw:
Here,
isthenpuda,wisthekrnl,and
isthefaurmp.
Data transformations Often,inayrel-worldimpentafCNN,datprocesingfmky stepoachivngdury.Inthisecon,weilcovrsmbautpn datrnsfomiephclyu .
[ 95 ]
Input preprocessing Let'saumethord,X,hasNimagesndchDflatendoupixs. Thefolwingthrpcsauymd
9:
Mean subtraction:Inthisep,wecomputanigrshl datsenubrchimgfo.Thistephafco centrighdaoslfum.To implenthsPython: JNQPSUOVNQZBTOQ NFBO@9OQNFBO 9BYJT DFOUFSFE@99NFBO@9
Normalization:Themansubtrciopflwdyz step,whicastefolngurdm. Thisdonebyvgachfturlm. The theinpuda.ItcanbeimpltdPythonasflw: figurelstahconmz
Thecodfrbtaingmlzsw: TUE@9OQTUE DFOUFSFE@9BYJT OPSNBMJ[FE@9DFOUFSFE@9TUE@9
[ 96 ]
PCA whitening:Oneothrimpansfudl networks,ingeral,iswhtengu AlthougismednwlyCNNs,itsanmpore worthdescibng.Whitengcabudrsohpw de-corelatdbympuinghvxs dimensoaltyfhgprc,asdeir. The step.ItcanlsobeimptdPythonasflw: figure
Principal Component Analysis(PCA).
showtegmricnpaf
$PNQVUFUIFDPWBSJBODFNBUSJYGSPNUIFDFOUFSFEEBUB DPW@NBUSJYOQEPU DFOUFSFE@95DFOUFSFE@9DFOUFSFE@9TIBQF 1FSGPSN4JOHVMBS7BMVFE%FDPNQPTJUJPO 647OQMJOBMHTWE DPW@NBUSJY $PNQVUFUIFXIJUFOFEEBUBXJUIPVUEJNFOTJPOBMJUZSFEVDUJPO EFDPSS@9OQEPU DFOUFSFE@96EFDPSSFMBUFUIFEBUB $PNQVUFUIFXIJUFOFEEBUB XIJUFOFE@9EFDPSS@9OQTRSU 4 F
Data augmentation Oneofthmscnrikpvgau theraingdlwy.Theramultipsgochvf: Translation and rotation invariance:Forthenwoklasi asrotinvce,itsofenugdamr imageswthdfrnpcvo.Forinstace,you cantkeipumgdflhorzys. Alongwithrzalfps,youcantrslehmbfwpixg posibletranfm.
[ 97 ]
Scale invariance:OneofthlimansCNNistnefcvo recognizbjtsadfl.Toadresthiocmng,itsofena godieatumnhrswcpf . Thesrandomcptub-sampledvrionftg.Youcan alsotkehrndmcpu-samplethorignd widthofenpumag . Color perturbation:Oneofthmrinsgda pertubinghcolvasfmdy .
Network layers Asintroducepvsha,atypiclCNNarchiteuonsfly, eachofwitrnsmpug.Eachofteslayr maybelongtrhcs.Eachlasofyerpiunt network.The figureshowanxmplctk,whic iscomp ed f input layer,convolutional layer,pooling layer,and (FC).AtypicalConvNetcanhvrius[INPUT->CONV->POOL->FC]. Inthisecon,weildscrbahoftynmgv andsigfceormp:
[ 98 ]
fully connected layer
Convolution layer OneofthcrbuildngksCNNs,theconvluiayrspbfg aspecifonvlutrmg.Thisflterapdonchub-regionf theimag,whicsfurtednbyloavpm.Each filterapcondusvx,whicenombd acroslpixetnfd
feature map. Forexampl,ifyouseght filtersocnva32x32imagetvrysnlpxoc,youwilprdce12outp featurmpschoiz32x32.Inthiscae,eachofturmpswilbd corespndigtaulvf.The ilustraehconpmd . figure
Oneofthimpranqusdc,howdesnca particulfeonvhmgw?Toanswerthiquo,thisfler,infact,an actulernbpmofhdsvgi . Assuch,thedsignoflrbcmaxypu perfomancthwk.Atypicalfermbvxshgndw. Howevr,acombintfsuhlerpdwv potenfaurdc.Thetraingpocsfuhlvym.First, youdecithnmbraszflpv network.Duringthesaofpc,staringvluefohc randomly.Duringthefowadpsbcklm,eachfiltrs convledatrypsibxuhmgf.Thes featurmpsnowchigbqlyd extracionfhglvmusw. Oneimportanshcugfvlyx locatinsmpuyefdr.Forexampl,ifthenpuvolmas asizeof[32x32x3],andthefilrsz5x5,thenacuroivly wilbecontda[5x5x3]regionthpuvlm,genratiolf5*5*3= 75weights(and+1biasprmet).Toreducthispamxlon,wesomtiu aprmetfdos
stride length.Stridelngthosapbw subeqntfilrapco,therbyducingszofp signfcatly .
[ 99 ]
Oneofthps-procesinglayftdhvu eht
Rectified Linear Unit(ReLU).ReLUcomputeshfni theadvngsofReLUisthagrelyconvfm sucha
.Oneof
stochastic gradient descent(SGD):
Pooling or subsampling layer ApolingrsubamyeftdwcvCNN.Its roleistdwnamphufcvygb ofheigtandw.Forexampl,a2x2polingeratf12featurmpswil produceantsfiz[16x16x12](seth figure). Theprimayfunctolgsdhb bythenwork.Thisalohtednfcrugvby increasgthovlpfmduywk . Theramultipchnqsodg.Someofthscnplig techniqusar: Max pooling:Inthiscae,afeturmpochld(2x2inthe previousxaml)isreplacdbyngvu,whicste fourvalesindthp
maximumofthe
[ 100 ]
Average pooling:Inthiscae,afeturmpochld(2x2inthe precdingxaml)isreplacdbyngvu,whicste
averageoftheur
valuesindthpor Ingenral,apolingyerctshfw: Inputvolmefsiz: Requirestwopam: Theirspatlxn Thestrid Producesavlmfiz
wher:
Fully connected or dense layer OneofthinalyrsCNNisoftenhulycdar,whicsalokn dense layer.Neuronsithlayefcdvp layer.Theoutpfhislayrc,whertnumbofsi layerqusthnmbofcpi .
[ 101 ]
Usingthecombaflyrspvud,aCNNconvertsaipumg tohefinalcsr.Eachlayerwoksindftpm requimnts.Theparmtsinhlydougc-based algorithmnbckpwy .
Network initialization Oneofthmsinglyrva,yetcruial,aspectofCNNtraingsewok intalzo.EveryCNNlayerhsctinpmowgdv traingse.ThemostpulargihnwSGD.InputsoSGD includeatsofwgh,alosfuncti,andlbetrig.SGDwiluseth intalweghsocmpuvbrdj weightorducls.Thisadjutewghlnobfxr theprviouscnlgad.Ascanbesfromthi proces,thecoifnalwgrkzpysu qualityndspeofcvrgwk.Hence,anumberofstgihv benaplidtorshu.Someoftharslw: Random initialization:Inthiscem,alweightsrndomy valueinty.Onegodpractiwhnmsu samplecofrznduitvGausian distrbuon.Theidabhnromztspl hadverysimlontcwgu,evrynuowil computexalyhsvndkgri iteraon.Thismeanvryuowlftdhk won'tbedivrsnoughlapfm.Toensur diverstynawok,randomweightsu.Thiswouldenrghta asignedymtrcl,leadingtovrsbyhwk.Onetrick withrandomlzugsekvc.Ifyoudefinwghts randomly,thedisrbuonfpmwlavcg varince.Onercomndatislzhwgubf inputsohelayr. Sparse initialization:UsedbySutskevral., neuroadmlyctiKneurosithpvlay. Weightsforeacnudmlyb.Atypical numberof is10-15.Thecorintubhdsapl numberofctisahpvly. Itisoftenagdlzhb0inthsparcule.Incaseof ReLU,youmightwancsel0.01toensurm gradientspofw.
imagnefthwork
inthscem,wechosa
[ 102 ]
Batch normalization:InventdbyIofeandSzegdy,batchnormlizs beroustplmfnwkiaz.Thecntralidhs schemitofrwlnkazuGausian distrbuonahefgp.Ittakeswoprm, andgertsbchomlizv: ofinput as:
and
,
-0.5to
+0.5
Regularization OneofthcalngsirCNNsisoverftng.Overfitngcabds phenomwrCNN,oringealythm,perfomsvywlin optimzngrae,butisnoalegrzwd.Themostcn trickusednhomya
regularization,whicsmplyadng apenltyohsfucibgmzd.Theraviouswyfglznth network.Someofthcniqusarxpldw: L2 regularization:Onethmospularfgizn,anL2 regulaizmpntsqdyohw,meanig,theigr thewigs,theigrpnaly.Thisenurocthwkad,the valueofptimwghsr.Intuively,thismeanwork havingsmlerwtuofpydb diversf.Havingherwtsaoklmdp morenuswithg,evntualymkigbsd.The figurelstahcvyw,afterL2regulaizton, yousemrwightcnad beforgulaiztn.
-0.2to
[ 103 ]
+0.2asopedt
L1 regularization:OneofthisuwL2regulaiztonshv thersulingwam,theyarmoslpiv.Thismeanth networkisagmulpvfh.Thisbecoma problemwhnyuadigts.Youwantocmpley elimnatkgosypud,youwldiketpac0weighton suchinpt.ThispavethforL1regulaizton.Inthiscae,youad first-ordepnaltywighsfc-ordepnaltysuchL2.The resultingfcohazb figure.Youcansefwrightbo-empty,sugetinha networkhasldpig,whicaremobustnyp.You canlsombiethL1andL2regulaiztonschm, elastic net regularization. Max-norm constrained regularization:Inthiscem,youcnstraihe maxiuposblenrfwghtvc-specifdvalu p sucha
whicsalorefdt
.Thisenurthawokgdply boundeatpfcrshlig thenwork. Dropout regularization:Oneofthrcnadvsigulzp. Theidahrstoupm, ,whicdefnsaproblty younlseactivfrhx.The figureshowanxmpl.Withadropume of andfoures,yourandmlsectw(0.5*4)whose activonwlbefrdhxy.Sinceyouardpigt activonsdurg,younedtscalhivprw thaesingprmucd.Todothis,youalsdn
,
inverted
dropout,whicsaletvonbyfr :
[ 104 ]
YoucandroptlyeiTensorFlowusingthefcd: ESPQPVU@SBUF GDUGMBZFSTESPQPVU GDSBUFESPQPVU@SBUFUSBJOJOHJT@USBJOJOH
Loss functions Sofar,wehavsnCNNistranedugc-basedlgorithm minzealosfuctgvhrd.Theramultipwysodfnh chosetilfun.Inthisecon,weilokatsmfhcnyud losfunctiedrCNNtraing: Cross-entropy loss: Thisonefthmpularydc CNNs.Itisbaedonthfcr-entropy,whicsameurofdtn betwnarudiso andestimrbuo ,andcbefi as
, .Usingthsmeaur,thecros-entropylsca
bedfinasolw:
[ 105 ]
Hinge loss: Hingeloscabimpydfw:
Letusnderahilofcwxmp.Let'saumewhvtrcl, andforgivetpCNNoutpshrecfalin folwingrde:[10,-5,5].Letusalomehcrfidpn firstclandhevuo is10.Inthiscae,theingloswudbcmpaf:
.Thisemntuv
Asshownprecdigly,theoalsfunciwdmp sincethlovaumrbg, case.
inths
Model visualization OneofthimpranscCNNisthaonce'strained,itlearnsofu mapsorfilte,whicatsfeurxonlmg.Assuch,itwouldbe greatovisulzhfndwk througisan.Fortunaely,thisagrownefld techniqusamkrovlzdfby CNN. Theratwo primaytsofhenwkgvulz : Layer activation:Thisthemocnfrwkvualz onevisualzthcfrdgwpk. Thisvualztonmprfe: Italowsyutehcrndfip image.Youcansethiformgqlv understaigofwhlp.
[ 106 ]
Youcanesilydbgthworkfm arelnigyusft,oraesimplybnkg sugetindrwoka The . folwing
figureshowtpn moredtail:
[ 107 ]
Filter visualization:Anothercmnusafvilz theaculfirvsm.RemebrthaCNNfilterscanob understo a feature detectors,whicenvsualzdmotr kindofmageturchlx.Forexampl,the precding
figurelstahCNNfilterscanb
trainedocxgsfwl colrmbinats.Noisyfltervaucnobdgh techniquoprvdafbkglyw :
Handwritten digit classification example Inthisecon,weilshotmpnaCNNtorecgniz10clashndwrite digtsunTensorFlow.Wewiluseth 60,000traingexmplsd10,000tesxamplofhndwrigz, wheracimgs28x28-pixelmonchrag.
./*45datseforhiclng,whiconstf
[ 108 ]
GFBUVSFT variblendsth
Letusamelfrpnih varible.Webeginwthmporcsaykdulf thepr-loade
MBCFMT
GFBUVSFTvarible: JNQPSUOVNQZBTOQ JNQPSUUFOTPSGMPXBTUG JNQPSUNOJTU NOJTUUGDPOUSJCMFBSOEBUBTFUTMPBE@EBUBTFU NOJTU GFBUVSFTNOJTUUSBJOJNBHFT3FUVSOTOQBSSBZ *OQVU-BZFS */165UGSFTIBQF GFBUVSFT
Wewiluseantorkchfvy,twopling layers,andtwofuly-conetdlayrsihfwg:[ $0/7-> 100--> '$-> '$].Weuse32filtersachoz5x5for 100-withosrde.ItismplentdTensorFlowasf: for
*/165-> $0/7-> 100--> $0/7and2x2filters
$0/7UGMBZFSTDPOWE
JOQVUT*/165 GJMUFST LFSOFM@TJ[F QBEEJOHTBNF BDUJWBUJPOUGOOSFMV 100-UGMBZFSTNBY@QPPMJOHE JOQVUT$0/7QPPM@TJ[FTUSJEFT
Weuse64filtersoz5x5for alsocnethyrpviu,asfolw:
$0/7and2x2
100-filtersagnwhod.We
$0/7UGMBZFSTDPOWE
JOQVUT100- GJMUFST LFSOFM@TJ[F QBEEJOHTBNF BDUJWBUJPOUGOOSFMV 100-UGMBZFSTNBY@QPPMJOHE JOQVUT$0/7QPPM@TJ[FTUSJEFT
[ 109 ]
Theoutpf nedtocasrfuly-conetdlayr.Onceflatnd,wecontiafulyconetdlayrwih
100-isatwo-dimensoaltrx,whicnedstobflau neuros:
100-@'-"55&/&%UGSFTIBQF 100- '$UGMBZFSTEFOTF JOQVUT100-@'-"55&/&%VOJUT BDUJWBUJPOUGOOSFMV
Toimprovethnwkag,wendtoargulizschm.Weusea andcoetifulyr.Finaly, neuros-oneurfachdigtls:
droputlayewihf thislayercondfw
%301065UGMBZFSTESPQPVU
JOQVUT'$SBUFUSBJOJOHNPEFUGFTUJNBUPS.PEF,FZT53"*/ '$UGMBZFSTEFOTF JOQVUT%301065VOJUT
MPTTfunctioadsr
Nowthaenorkisfulycgd,wendtofia traing.Asdescribpngly,wechosr-entropyls,asfolw:
$BMDVMBUF-PTT GPSCPUI53"*/BOE&7"-NPEFT POFIPU@MBCFMTUGPOF@IPU JOEJDFTUGDBTU MBCFMTUGJOU EFQUI MPTTUGMPTTFTTPGUNBY@DSPTT@FOUSPQZ POFIPU@MBCFMTPOFIPU@MBCFMT MPHJUT'$
Wenowsetuplarigmfdc: $POGJHVSFUIF5SBJOJOH0Q GPS53"*/NPEF PQUJNJ[FSUGUSBJO(SBEJFOU%FTDFOU0QUJNJ[FS MFBSOJOH@SBUF USBJO@PQPQUJNJ[FSNJOJNJ[F MPTTMPTT HMPCBM@TUFQUGUSBJOHFU@HMPCBM@TUFQ USBJO@JOQVU@GOUGFTUJNBUPSJOQVUTOVNQZ@JOQVU@GO
Y\YGFBUVSFT^ ZMBCFMT CBUDI@TJ[F OVN@FQPDIT/POF TIVGGMF5SVF NOJTU@DMBTTJGJFSUGFTUJNBUPS&TUJNBUPS4QFD
NPEFNPEFMPTTMPTTUSBJO@PQUSBJO@PQ NOJTU@DMBTTJGJFSUSBJO JOQVU@GOUSBJO@JOQVU@GOTUFQT
[ 110 ]
Fine-tuning CNNs ThoughCNNscanbesilytrdgvouhmpw, traingh-qualityCNNtakeslofirndpc.Itisnotalwye optimzeahugnbrfs,oftenihragmls,whiletrangCNN fromscath.Moreov,aCNNisepcalyutdorbmwhg.Often, youarefcdwithpblmsngCNNonsuch datsemylovrfing.Fine-tunigaCNNisoneuchtq thaimsodrepflCNNs.Thefin-tunigofaCNNimplesthayounvr trainheCNNfromscath.Instead,youstarfmpevilndCNNmodelan finelyadptchgmowsbrux.This straegyhmulipdvn : Itexploitshargnumbf-trainedmolsyvbf adption Itreducsthompinwkalybf, anditcovergsquklyfhw Itcanlsowrkmedtvifgpy Theramultipwysofn-tunigwhCNNs.Theyarlistdfow: CNN feature extractor:Often,youarefcdwithnmglsk haspecifnumbrol,say20.Giventhisak,onebviusqt is,howdyutakevngfxispr-trainedCNNmodelsthav higas1000clasbenduthmfori-tunig?TheCNNfeaturxco isatechnquwro.Inthisecnqu,wetakpr-trained CNNmodelsuchaAlexNet,whicas1000clase,andremovthlsfuy conetdlayrihsfwk.Wethenprfoma forwadpsechinutmgvl convlutiayers,sucha layer,sucha fc6.Forexampl,ifyouchse6,yourtalnmbefcivs is4096,whicnoats4096-dimensoalfturvc.Thisfeaturvco canowbeusdithyxgmlrf,suchaSVM,to trainsmple20-clasciftonmde.
Conv5orevnthpulimafycd
[ 111 ]
CNN adaptation:Sometis,youwldiketavngf conetdlayrsihwkmvfu.Insuch scenario,youcanreplthsfd-trained networkihyuvsfalcdg numberoftpclasyi
forexampl,20claseinth precdingxaml.Oncethisnworkfgud,youcpverth weightsfromp-trainedwokuslzh. Finaly,thenworkisugbacpd networkighsyupacld.Thismethoda obviusadntgefylcrk,and withlemodfcan,youarebltsp-trainedmolswptrainedwokchuvys.Thistraegylowk withasmlounfrgdcep-trainedwokhsly bentraidwhlgvoumsf . CNN retraining:ThistraegyuflwhnodCNNfrom scrath.Usualy,fultraingoCNNmayspnd,whicsnotveryufl. Toavoidthsmul-daytring,itsofenrcmdalzhwg ofyurnetwkihap-trainedmolshgfyuwk fromthepinw-trainedwokspg.Thisenur evrystpofaingdmlbu wastingprecoumh-learnigodftshvyb learndbyotsfp trainedmols.Usualy,itsadvbleoumr learnigtswhu .
Popular CNN architectures DesignaperfctCNNarchiteunvolsgmfxpd computewr.Hence,itsofen-trivalochepmCNNarchiteudsgn. Fortunaely,anumberofCNNarchiteusxodygnpf manydevloprschtwifgCNNnetworkfm scrath.Inthisecon,weilgovrsmpuaCNNarchiteusknowdy.
[ 112 ]
AlexNet OneoftharliswknpuzgCNNsinlarge-scaleimg clasifton,AlexNetwasprodbyAlexKrizhevskyandtrco-authorsin2012.It wasubmitednryohImageNetchalngi2012andsigfctly outperfmdisn-upwitha16%top-5eroat.AlexNetconsifghlayr thefolwingrd : [INPUT->CONV1->POOL1->CONV2->POOL2->CONV3->CONV5->CONV5-> POOL3->FC6->FC7->FC8]. CONV1isaconvlutyerwh96filtersoz11x11.CONV2has256filtersoz 5x5,CONV3andCONV4have384filtersoz3x3,folwedbyCONV5with256filters ofsize3x3.Allpoingayers,POOL1,POOL2,andPOOL3,have3x3polingfters.Both FC6andFC7have4096neuroswithlaybgFC8with1000neuros,whics equaltoh1000outpclaseinhbd.Itisaverypoulchtnd oftenhirsCNNarchiteupldog-scaleimgrontkdy.
Visual Geometry Group ThisarchteufomSimonyadtheirc-authors[17]wastherun-upinthe ImageNetchalngi2014.Itisdegnothcrapwkb networks.Thoughteyprvidalfc,theyavnirlg numberofpats(~140M)anduselotmryhAlexNet. Group(VGG)hasmlerfitnAlexNet,wheracfiltsoz3x3butwiha lowerstidfn,whiceftvlyapursmd7x7filterwh fourstide.Ithastypicl16-19layersdpnigothcuVGGconfigurat. The
Visual Geometry
figurelstahc:
[ 113 ]
GoogLeNet WhileVGGwastherun-upintheImageNet2014chaleng,GoogLeNet, alsoknw inception,wastheingubmo.Ithas22layersintowhfuconetdlayr.Oneofthprimaycnbuswgld parmetsof5Mfrom60MinAlexNet.Thoughtenmbrfpasidc,it iscomputanlyrexvhfdwk.The figurelstahc:
ResNet ResNetiscurnlyha-of-the-artchieufolg-scaleimgront.Oneof themsincowprvuadk,the betrhpfomanc.Howevr,withncreasgdpofk,theproblmf vanishing gradientsisalompfednchyruvtg withrespcogadnfmvuly.Thelargthnumbofys,the smalerthgdinbco,evntualyishgo0.Toavoidthsprblem,ResNet introducesahg,wherinstadofcmpugv
,you
nowcmputehgradiv
,wher
istheorgnalpuwk.This
alevitshfco
getinsucvlymar.Theadvntgofhisry thanowyucredpksim150layers,whicasnot posiblefr.
[ 114 ]
Thefolwing detail:
figurelstahnwokcm
Summary Inthiscaper,wedscribthonpfavlukm detail.WeprovideasumzhtyfCNNandhowitrge.Wecoverd thebasicofCNNs,rangifometwkchu,layers,losfuncti,and regulaiztonchqs.Wealsoutinedprcmfh coneptsadliurhwmgfy TensorFlow.Wealsoutinedhwpr-trainedmolfcusp devlopmnt.Finaly,weilustradpoCNNarchiteusofnv intalchoefdvprsymuk.Innextchapr,weilokat howdeplarnigtcqusf .
[ 115 ]
NLP - Vector Representation Natural language processing(NLP)isonefthmpraclg learnig.Understanigcomplxuf TheaplictonsfNLParelmostvywh,aswecomunitlyhrg languedstorhmkwyi.Thisncludewb search,advertismn,emails,customervi,machinetrslo,andso.Someof theorsacinludgm(spechrognit,machinetrslo), word-senlarigdmbuto,reasonigvkwldb,acousti modeling,part-of-spechtagin,name-entiyrcog,sentimaly,chatbo, question/answerig,andothers.Eachoftesakrquidpng taskorheplicn,aswelfctivndmhrgo. Similartocpuevsn,extracingfusodmlp. Recently,deplarnigochsvbtf repsntaiolgfxd.Thischapterwldbomng NLP.Wewiltakbouhres-of-the-artembdingols:Word2Vec,Glove,and FastText.WewilshoanexmpftrWord2VecinTensorFlow,andhowt dovisualztn.WewilasotkbuhedfrncWord2Vec,Glove,and FastText,andhowtusemiplcxf.
Traditional NLP Extracinguseflomx-basedinformtyk.Forabsic aplicton,suchadomentlif,thecomnwayfurxisld bag of words(BoW),inwhctefrquyoads featuroinghcls.WewilbrefytakouBoWinthefolwgsc,as welasthf-idfaproch,whicsntedorflmpa documentialrps.
artificial intelligence(AI).
Bag of words BoWismanlyforctegzdu.Itisalouedncmptrv.Theidasto repsnthdocumabgfw,disreganthmo ofthewrdsqunc . Afterhpocsingfx,oftencaldh andBoWrepsntaiofchdumbl.
corpus,asetofvcbulryignd
Takethfolwingxsamp: 5IFRVJDLCSPXOGPYKVNQTPWFSUIFMB[ZEPH OFWFSKVNQPWFSUIFMB[ZEPHRVJDLMZ
Thecorpus(textsampl)thenformadicywks *%:
columnasthewrd
\ CSPXO EPH GPY KVNQ KVNQT MB[Z OFWFS PWFS RVJDL RVJDLMZ UIF ^
Thesizofthvcabulry(V=10)isthenumbrofqwdcp.Sentecs wilbethnrpsdag10vector,wheracntyospdi thevocabulry.Thevaluofthisnrydmb corespndigwuthm . Inthiscae,theswoncilbda10-elemntvcors,likeso: 4FOUFODF 4FOUFODF
[ 117 ]
Eachelmntofvrpsubawdi corpus(textsnc).Therfo,inthefrsc,theris 0inthevcor), isno c ur e f aryvecto),
countfr forthe countsfr
EPH,for GPY,andso.Similary,forthescnd,ther forpsitn0, CSPXO,sowegt GPY,andsofrth.
countfrhe
CSPXO(atposin EPH(atposin1ofthe
Weighting the terms tf-idf Inmostlangue,somewrdtnapfhbuyci muchdiferntavogjslyw. Examplesrwoduch considerthawfquypv,wemightnobal
,
,and
,whicarelvyomnEnglish.Ifweonly
efctivlydrnabwsoumh documentshar. Oneaprochtklisbmd frequency(tf-idf).Likeitsnam,itakesnocuwrm: inverse document frequency(idf). Withf,
,thesimplcourawnfd;thais,
henumbroftisa longerdcumts,acomnwyistkehrfqudvbx frequncyoatmsihd :
oc ursind me t
Inthisequon, The comnraeilduts.Onecomnwaytdrihsk logftheinvrspdcuma :
term frequency and inverse document term frequency(tf)and
.Howevr,toprevnabiswd
isthenumbrofc(rawcount)inthedocum. measurhowcinftdpv;thais,whetrmis
Bymultipyngbohvaes,tf-idfscalute:
Atf-idfsotenurmavlxg.
[ 118 ]
Deep learning NLP Deeplarnigbsmutfvo naturlge.Inthisecon,weilcovrthmanfusgdp distrbuepnaof NLP, wordembingsavlthpf wordembings,andplictos.
Motivation and distributed representation Likeinmayothrcs,therpsnaiofd,whicsotenfrma encodashwtmilrg,isoftenhmprad fundametlprisog AI. Thefctivnsadlbyoh repsntaiolgydmfhcw modelanpict. Asmentiodhprvusc,tradionlNLPoftenus-hotencdig repsnthwodiafxvcbulyBoWtorepsndcum.Suchan aprochteswd,forexampl,house,road,tre,asntomicybl.Theonhotencdigwlrapsk[000000000010000].Thelngthof repsntaiohzfvcbuly.Withsucrepnaio,oneftdsup withugesparvco.Forexampl,inatypclseho,vocabulrysizen befrom20,000to500,000.Howevr,ithasnobvuprlem,whicstae relationshpbwyfdg,forexampl,motel[000000000010 000]andhotel[000000010000000]=0.Also,encodigsartulyb,for examplinostg, mayberpsntd entryofhlgspavci1.Suchrepsntaiovduflm thesymrgadincolxbwvu symbol .
and
Thismakethlrngofdcubw uchofwatislerndbm Therfo,discretIDsseparthculmnigofwd repsntaio.Althougsmeaiclnfrbd lev,informatheclvsxyd.Thiswherdtbuvco repsntaio,andeplrigtcu,comesthlp.
as
,meanigth453rd
tbealovrg whenitsprocgda
.
Deeplarnigothmsuvfc complexity/abstrcion.
[ 119 ]
TheramultipbnfswgodNLPproblems: Asoftendirclyvmhapb,improveth incompletsadvr-specifatonhd-craftedu.Handcraftig featursionvymcgdbp aginforechtskdm-specifroblm.Featureslndfomi oftenshwligrazbydm.Onthe contray,deplarnigsfomth acrosmultipev,inwhctelorvspdmga informathcbelvgdys-tunig. Learnigfetushomlyxcvbp eficnthars-neighbour-likeorcustng-likemods.Atomicsybl repsntaiodcuymlhbw . Withwordsbeingaply,NLPsytemcanbirdlfg. Thedistrbuonalphcmfv spacerovidthunyflwgNLPsytemodrcplx reasonigdkwlvt . Learnigcbedouspv.Giventhcursalofd,therisa greatndfouspvil.Itisoftenralcqub manyprctilse . Deeplarnigsmutvof.Thisonefthm importandvgesfl,forwhictelandms construedlv-by-levthrougcmpsin.Thelowrvfpsntai oftencabshrdk . Naturalyhndescivofmg.Humansetcr compsedfwranhitu.Deeplarnig, espcialyruntmod,isabletocpurhqnfm amuchbetrsn .
Word embeddings Thevryfundamtliosb-basedrpntiohw canberpstdyhmofig.AssaidbyJ.R.Firth1957:11:
[ 120 ]
ThisperhaonftmucldNLP.Thedfinto neighborscavytkuflxm orsemanticp.
Idea of word embeddings Firstofal,awordisepntvc.Awordembingcathus
g
mapingfuctorhewd
-dimensoalpc;thais, saprmetizdfuncogwlhdimensoalvctr(forexampl,vectorswih200to500dimenso).Youmaylso consider aslokuptbewihzf and isthezofdmn,andechrowspt.Forexampl,we mightfnd:
g ,
,in
whic
,wher
isthezofvcabulry
8 EPH 8 NBU
Here,
8isoftenalzdhvrmcw,thenwlork 8inordetpfmsak.
learndupt Forexampl,wecantrihoklpd words)isvald.Let thisanpuwovelb(meanigvld).Wethenrplacsomfwdi thisencwradomf asnegtivchwlmory5-gramisnoecl:
-gram(sequncof saywegotnqucfrd
,ekatdn ,andthelbis
[ 121 ]
Asshowniteprcdgfu,wetrainhmodlbyfg
-gramthouge
lokupmatrix
Wandgetvcorhpsw.Thevctorsahnmbid througepn,andwecompritsulhgv.Aperfct predictonwulshfg : 3 8 B8 EPH8 CBSLT8 BU 8 TUSBOHFST 3 8 B8 DBU8 CBSLT8 BU 8 TUSBOHFST
Thedifrncs/erosbtwnhagvludpicf 8and
3 (theagrtionfuc,forexampl,sum).
Thelarndwombigshvtp. First,thelocainfwrdpsg-dimensoalpctr bytheirmangs,suchtawordilemng:
Second,whicsevnmortg,isthawordveclnp.The relationshpbwdcugfmy pairofwds.Forexampl,staringfomhelcwd distancerobw
king,movethsa manand
woman,andoewilgthr
Researchfoundtiglm,thersulingvcoa reflctvysubmaniohp,suchaityndeorblg.For exampl,FranceistoParisaGermanyistoBerlin.
[ 122 ]
queen,thais:
Anotherxamplisfndwbg simlartobg.Onecansimplyout Manyotherkidsfmaclpbu,suchaopitend, compartive.SomenicxaplsbfoudMikolv
. spublicaton, (IUUQTBSYJWPSHQEGQEG),ashown
inthefolwgur:
Advantages of distributed representation TheramnydvtgsofuibwcNLPproblems.Withe subtlemanicrohpgd,therisgaponlmvy existngNLPaplictons,suchaminetrlo,informatevl,andquestio answerigytm.Someobviusadntgr: Capturingloca-ocurenstai Producesta-of-the-artlinesmcohp Efficentusoa Cantraio(comparbly)litedangc Fast,only-zerocuntsma Goodperfmancwithsl(100-300)dimensovctrhapf downstreamk
[ 123 ]
Problems of distributed representation Keepinmdthaorcslvyg,andsimlry,adistrbue repsntaiolvbu.Touseitproly,wendtoursamfi knowisue : Similarity and relatedness are not the same:Withgreavluions presntdiomublca,therisnoguafc practilon.Onerasonithcudvlf onthedgrfclaivsuwbym.It tharepsniofmdlcwuv,butdo notbsperfmacgivk.Thisperhacudbytf thamosevluind tdisnguhbewormlay relatdns.Forexampl, and aresiml,wheras and areltdbuism. Word ambiguity:Thisproblemcuwhndavtg.For exampl,theword hastemnigof meanigof wordasnevctihugmby.Someaprochsv benprosdtlamuifchw.Forexampl, Traskndtheirco-authorspedmlibng foreachwdbsnupvimgt( ).Onecanrftohspwi
spo ible
inadtohe .Inthisway,therisalmopng
IUUQTBSYJWPSHBCT
snecaryfotk.
Commonly used pre-trained word embeddings Thefolwingtabscmyudpr-trainedwombgs: Name
Year URL
Comments Amultingapre-trained
Word2Vec
2013
IUUQTDPEFHPPHMFDPN BSDIJWFQXPSEWFD
[ 124 ]
vectorisalb
IUUQT HJUIVCDPN,ZVCZPOH XPSEWFDUPST
DevelopdbyStanford,its claimedtobrhn GloVe
2014
Word2Vec.GloVeisntaly acount-basedmolth
IUUQOMQTUBOGPSEFEV QSPKFDUTHMPWF
combinesglatrx decomp sit na l contexwid .
FastText
2016
IUUQTHJUIVCDPN JDPYGPH GBTU5FYU+BQBOFTF5VUPSJBM
InFastText,theaomicuns gramchtes,andwor
-
vectorispndbyh agretionfh
-gram chartes.Learnigsqute fast . LexVecprfomsatizn ofthe
positive pointwise mutual information(PPMI) matrixusng window sampling and negative sampling (WSNS).Saleandothrs,in LexVec
2016
theirwokf
IUUQTHJUIVCDPN BMFYBOESFTMFYWFD
,sugethaLexVec matchesndofupr competingdlswr simlartyndecog task. ByYinadothers, ,2016.It Meta-Embedings 2016
IUUQDJTUFSODJTMNVEF NFUBFNC
combinesdfrtpul embdingstora betrvcos(metaembdings).
Inthefolwingsc,weilmanytkbouhrps:Word2Vec,GloVe, andFastText.Inparticul,weildvprntoWord2Vecforitseda,itswo distncmoel,theprocsfaing,andhowtlevrgpsuc-trained Word2Vecrepsntaio.
[ 125 ]
Word2Vec Word2Vecisagroupfentdvmlwb tex.Itmapswordtvec.Inthemapdvcors,thewordsacmn contexarldsh.Inthisecon,weildscuntaWord2Vec anditswopecfml.WewilasodecrbhtnWord2Vecusing TensorFlow.
Basic idea of Word2Vec Word2Vecmodelsnyhavtr;theinpulayr,theprojcinlay,andthe outplayer.Theratwomdlshci,namelyth Words(CBOW)modelanthSkip-Gramodel.Theyarvsimlbutdfnhow theinpulayrdocs.TheSkip-Gramodelhsctrg word(forexampl, )astheinpudrcox/surondigwathe ( tup o ).Ontheorand,CBOWstarfomucenxwd( ),doesagrtinfmuhly,andpreictsh targewod( ).Thefolwingurstahdc:
Continuous Bag of
[ 126 ]
TakethCBOWmodelasnxp.Eachwordinteagsphotencdvr. isalorepntdby-hotencdig repsnthwigmax(conectis)betwnhipulayrdos rowepsnthigcdvabuly.Thisweght matrixswhendlgbcuovf ofthewrdsinuvcably(asitrow). conetighdlayr,witheouplayr,whicsaloedtnxr matrix.Thisthemarxcoupwdv.IntheSkip-Gram model,theinpusargwodvc ,andechtryospwivbul.Forthesamgwod multipears( theransfomwk,givenaputo-hotencdwr lengthvcor)shouldaveigrnmbtywcp-hot encodvtrfhxw.
istheon-hotencdigfrxw.Thetargwod .Thehidnlayrs
nodes.Matrix
istheoupmarx(conectis)
,theoupisavcrflng , , ),(, ),and(
, )aregntdfoi.Thegoalisth,throug ,theprdicon(a
The word windows Remebrthafoscindwg,weknothardc berpsntdyicox,ormespcifaly,itsowrd.Therfo,wecanus windotermhaysug(beforandt)we togehrwiadnc,ashowniteflgur:
Inthiscae,thewindosz twordsayfmhegilbncup.Then weslidthnoagrucx .
dliketoarn
2.Inordetlanhgw
sits,nearbywodsupt
[ 127 ]
Generating training data IntheSkip-Gramodel,wegnratpisofdl:
Apositverangcbdflw:
Fromthesiluran,onecasilythwrkg fromthenubisacpg(target,contex)showup.Forexampl,themodl aysemorplf tesinga,ifyougvethwrd
,
or
,
rathen
,
.Therfo,athe
asinput,itwloupamchgerbyf .
Fromthewayringds,onecatihrdul networkilayhgdfsupcxv theargwod.Takethxamplsownircdgfu,themodlwinak intoacuhes , targewod contexwrdsaihvyfglp. Therfo,theprdicobalyfnxwgvu repsnthobailycuv.
quickisfurtheawyn fox,while
brown,and
jumpisafter.Theinformathsbld
[ 128 ]
quickisbeforth
Negative sampling Fromthelsfunci,wecansthompuigfxlyrbv. Thecros-entropycsfuiqhwkdbal,whicmeans theoupscrfmandblizg probailtesfch(forexampl,awordinthevcbulySkip-Gramodel). Thenormaliztquscpfhd-layeroutpwihvdn contex-wordmatix.Inordetalwihsu,Word2Vecuseathniqld negative sampling(NEG),whicsmlarto
noise-contrastive estimation(NCE).
ThecoridafNCEistocnveramulfpb,suchateof predictngawobshx,wheracodnbsil,into abinryclsftopem
versu
.
Duringta,thenworkisfd theorwdincx,andfewromlygt
whicsteargodp ,whic
consitfheargwd
selctdworfmhvabuy.
Thentworkishfcdgumba,whic ultimaeydsohrngfpbcx. Essentialy,NEGkepsthofmaxlyrbuwidnc.Thegoalist forcethmbdingawslx,and diferntomhbgswaycx . Thesamplingroctuf-of-windoctexparsgm specifalydgntrbuo,suchanigrmdtbo,whicanbeformlzd as:
InsomeiplntafWord2Vec,thefrquncyisadopw anempirclvu:
Inthisequaon, frequntwodsbam.Changithesmplrycv signfcatmpohelru .
,whics
isthewordfquncy.Selectingbyhsdruoafv
[ 129 ]
Thefolwingurstahypd contex,randomlysectfhvbu:
Hierarchical softmax Computinghesofaxvbcrwd,wehavtocmpu denomiatrbhlzpy.Howevr,thednomiarsuf theinrpoducbwlayv , embding ,
,andtheoup ,ofevrywdinthcabul,
.
Tosolvethiprbm,manydifertpochsvb.Somearsoftxbasedprochuilftmx,diferntasomx,andCNNsoftmax on,whileotrsampng-basedproch.Readerscnfto
IUUQSVEFSJP
XPSEFNCFEEJOHTTPGUNBYJOEFYIUNMDOOTPGUNBYforadepunstig
aproximtngsfuc. Softmax-basedprochmtkfxlyinu architeuompvsfny(forexampl,hieraclsoftmx). Sampling-basedproch,howevr,wilcompetyrvhsfaxnd insteadopmzwlygfucrxh.Forexampl, aproximtnghedscu,likeNEG.Agodexplanti canbefoudtYoavGoldberganOmerLevy'spaer, ,2014IUUQTBSYJWPSHBCT ( ).
[ 130 ]
Forhieaclsoftmx,themaindsobulHufmantrebsdow frequncis,wheracodislft.Thecomputainfhsxvlr aprticulwodshenmgfb nodesithr,fromthela,tharepsnwod.Ateachsubtrpli point,wecaluthprobiygnf.The sumoftheprbailygndcq;thisguarne sumofaltheprbindq.Withablncedr,thiscanredu thecompuainlxyfr to
,
,wher
Other hyperparameters Ontopfhenvlywagrims,suchaSkip-Gramodels(withnegav sampling),CBOW(witheraclsofmx),NCE,andGloVe,comparedtinl count-basedproch,theralsomnywpcig thacnbeudoimprvf.Forexampl,subampling,removinga words,usingdyamcotexw,usingcotexdrbmh,ading contexvrs,andmore.Eachoneftm,ifusedproly,wouldgreatyhpbs perfomanc,espcialynrtg.
Skip-Gram model WenowfcusaimprtdelhWord2Vec,theSkip-Gramodel.As describnthgof,theSkip-Gramodelprictshnxw giventhpuarwod.Thewordmbingatsfchx betwnhipulayrd.Next,wexplainthSkip-Gramodelin details .
Wel,wecan tfedaworiclysxnghuk.Instead,wend somethingacly.Suposewhavcbulryf10,000uniqewords;by usingoe-hotencdig,wecanrpsthodvflg10,000,withone entryasoihpcdgwlf,andzeroilfth posit n . TheinputofhSkip-Gramodelisngwrpt(one-hotencd)with lengthquaosizfvcbry, pairs.
,andoutpisermbyhg
[ 131 ]
isthevocabulryz.
Thehidnlayr,inthscae,doesnthavycifu.Theconti betwnhipulayrdcogfsmx, wher
, iswth
isthenumbrofdlay. forevywdinthcabul,and
rows,thaisone
columns,thaisonefrvydu.The
umbern whiconetsdlayrup,andthesimlryofw, withe
, ,
wilbethmdngvcor.Therisanothuxlym, contexwrd(out-of-windor)ismnzed.
Theoutplayrisfmxgnc.Ittakesvcorfbiyl-valued score (
)andsquheitovcrflbwzm:
Theoutpfrmhidnlay(thewordvctfinpuag,
)is wheracolumnfitpsd(let
multipedwhaxry as ume word, theprobailyfvngcxwd , Here aplyingthesofmxuc:
s
).Thedotprucsavlwhi,afternomliz,repsnt ,giventhpuarwod, sanilutrofcghepwd
. and
Thelosfunctihgavrmpx:
[ 132 ]
Remebrthaolsfdin togehrwiaulznm,
overaltingxmps .
Froman
information theorypointfvew,thisenalycro-entropylsfuci.
Wecanuderstihfolwgy. Thecros-entropyisdfa:
Witheaculdisrbon asdeltfuncio,itsenropym
.
Therfo:
Inthisequaon, Inourcase,
istheaculdrbon
isthemaonfdrbu. isentalyhofmxup,whics
,andwith wheronlyt
, thenryis
.Theprcdingquatobsmlf:
So,minzgthelosfucrqva-entropyvalu. Froma probabilistic interpretationpointfvew, (normalized)probailtysgnedhc.Wearectulyminzgh negativlokhdfcrs,thais,perfoming likelihood estimation(MLE).
canbeitrpdsh maximum
Continuous Bag-of-Words model Forthe Continuous Bag-of-Words(CBOW)model,theidasvnmorgfw, becausthmodlrnigxpw.Theinputsar stilbacyheurondgxwfz , agrethm(adingthero-hotencdig)first,thenipuoralwk. Theswordilthnbpcugmay,andtheoupisrg wordinthec .
[ 133 ]
;thedifrncsaw
Training a Word2Vec using TensorFlow Inthisecon,weilxpanst-by-stephowbuildanrSkip-Gramodel usingTensorFlow.Foradetiluonsc,pleasrfto
IUUQTXXX
UFOTPSGMPXPSHUVUPSJBMTXPSEWFD:
1.Wecandowlthesfrm 2.Wereadinthcoflsw. 3.WesetuphTensorFlowgraph.Wecreatplhodsfinuw theconxwrds,whicarepsntdgovbuly:
IUUQNBUUNBIPOFZOFUEDUFYU[JQ.
USBJO@JOQVUTUGQMBDFIPMEFS UGJOUTIBQF USBJO@MBCFMTUGQMBDFIPMEFS UGJOUTIBQF
CBUDI@TJ[Frefstohizbac.We
Notehawrinbcs,so alsocretnhdvi, WBMJE@FYBNQMFTisanryoftegdchvbul
herw validton:
WBMJE@EBUBTFUUGDPOTUBOU WBMJE@FYBNQMFTEUZQFUGJOU
Weperfomvalidtnbycughsw thevalidonscbury,andfitheworsvcbuly aremostilhwdnv . 4.Wesetuphmbdingarxvl: FNCFEEJOHTUG7BSJBCMF
UGSBOEPN@VOJGPSN FNCFEUGOOFNCFEEJOH@MPPLVQ FNCFEEJOHTUSBJO@JOQVUT
[ 134 ]
5.Wecreathwigsndbolyup softmaxlyer.Thewightsvarblmxofz FNCFEEJOH@TJ[F,wher FNCFEEJOH@TJ[Fisthezofdnlayr.Thesizofth isthezofuplayr:
WPDBCVMBSZ@TJ[F WPDBCVMBSZ@TJ[Fisthezofuplayrnd CJBTFTvarible
XFJHIUTUG7BSJBCMF
UGUSVODBUFE@OPSNBM TUEEFWNBUITRSU FNCFEEJOH@TJ[F CJBTFTUG7BSJBCMF UG[FSPT IJEEFO@PVUUGNBUNVM FNCFEUGUSBOTQPTF XFJHIUT CJBTFT
IJEEFO@PVUandusecro-entropylsimzh
Now,weaplysoftmxh weights,biase,andembig.Inthefolwingcd,wealsopcifygrdnt
:
optimzerwhlangf
USBJO@POF@IPUUGPOF@IPU USBJO@DPOUFYUWPDBCVMBSZ@TJ[F DSPTT@FOUSPQZUGSFEVDF@NFBO
UGOOTPGUNBY@DSPTT@FOUSPQZ@XJUI@MPHJUT MPHJUTIJEEFO@PVU MBCFMTUSBJO@POF@IPU PQUJNJ[FS UGUSBJO(SBEJFOU%FTDFOU0QUJNJ[FS NJOJNJ[F DSPTT@FOUSPQZ
[email protected]
Foreficny,wecanhgtlosfuirm loswarignypedMichaelGutmandheirco-authorspe,
: ODF@MPTTUGSFEVDF@NFBO
UGOOODF@MPTT XFJHIUTXFJHIUT CJBTFTCJBTFT MBCFMTUSBJO@DPOUFYU JOQVUTFNCFE OVN@TBNQMFEOVN@TBNQMFE OVN@DMBTTFTWPDBCVMBSZ@TJ[F PQUJNJ[FS UGUSBJO(SBEJFOU%FTDFOU0QUJNJ[FS NJOJNJ[F ODF@MPTT
Forvalidton,wecomputsinlarybdghv setandhwormbigvculy.Later,weilprntho vocabulrythesmdingw.Thecosinmlarty betwnmdig
wordsinthe and
isdef n a :
[ 135 ]
Thistranleohfwgcd: OPSNUGTRSU UGSFEVDF@TVN UGTRVBSF FNCFEEJOHT LFFQ@EJNT5SVF OPSNBMJ[FE@FNCFEEJOHTFNCFEEJOHTOPSN WBMJE@FNCFEEJOHTUGOOFNCFEEJOH@MPPLVQ
OPSNBMJ[FE@FNCFEEJOHTWBMJE@EBUBTFU TJNJMBSJUZUGNBUNVM
WBMJE@FNCFEEJOHTOPSNBMJ[FE@FNCFEEJOHTUSBOTQPTF@C5SVF
6.NoweardytounhTensorFlowgraph: XJUIUG4FTTJPO HSBQIHSBQIBTTFTTJPO 8FNVTUJOJUJBMJ[FBMMWBSJBCMFTCFGPSFXFVTFUIFN JOJUSVO QSJOU *OJUJBMJ[FE BWFSBHF@MPTT GPSTUFQJOSBOHF OVN@TUFQT 5IJTJTZPVSHFOFSBUF@CBUDIGVODUJPOUIBUHFOFSBUFTJOQVU XPSETBOEDPOUFYUXPSET MBCFMTJOBCBUDIGSPNEBUB CBUDI@JOQVUTCBUDI@DPOUFYUHFOFSBUF@CBUDI EBUB CBUDI@TJ[FOVN@TLJQTTLJQ@XJOEPX GFFE@EJDU\USBJO@JOQVUTCBUDI@JOQVUT USBJO@DPOUFYUCBUDI@DPOUFYU^ 8FQFSGPSNPOFVQEBUFTUFQCZFWBMVBUJOHUIFPQUJNJ[FSPQ BOEJODMVEFJUJOUIFMJTUPGSFUVSOFEWBMVFTGPS TFTTJPOSVO @MPTT@WBMTFTTJPOSVO
GFFE@EJDUGFFE@EJDU BWFSBHF@MPTT MPTT@WBM JGTUFQ JGTUFQ BWFSBHF@MPTT 5IFBWFSBHFMPTTJTBOFTUJNBUFPGUIFMPTTPWFS UIFMBTUCBUDIFT QSJOU "WFSBHFMPTTBUTUFQ TUFQ BWFSBHF@MPTT BWFSBHF@MPTT GJOBM@FNCFEEJOHTOPSNBMJ[FE@FNCFEEJOHTFWBM
[ 136 ]
7. Inaditon,weantopriuhdsml validtonwrs earli,andsortigheul.Notehaisnxpvor,sowed itonlycevr10,000step: wedothisbycalngmrpf
JGTUFQ TJNTJNJMBSJUZFWBM GPSJJOSBOHF WBMJE@TJ[F SFWFSTF@EJDUJPOBSZNBQTDPEFT JOUFHFSTUPXPSET TUSJOHT WBMJE@XPSESFWFSTF@EJDUJPOBSZ UPQ@LOVNCFSPGOFBSFTUOFJHICPST OFBSFTU TJNBSHTPSU MPH@TUS /FBSFTUUPT WBMJE@XPSE GPSLJOSBOHF UPQ@L DMPTF@XPSESFWFSTF@EJDUJPOBSZ MPH@TUS TT MPH@TUSDMPTF@XPSE QSJOU MPH@TUS
Itisntergochaf,theopignarswd are MBOUIBOJEFT,EVOBOU,KBH,XIFFMCBTF,UPSTP,CBZFTJBO,IPQJOH,and TFSFOB,retfaub 30,000step,theopignarswd are TJY,OJOF,[FSP,UXP,TFWFO,FJHIU, UISFF,and GJWF.Wecanuset-SNEbyMaatendHinto,from (2008) IUUQXXXKNMSPSHQBQFSTWPMVNFWBOEFSNBBUFOBWBOEFSNBBUFOB QEG,tovisualzehmbdngfwc: GSPNTLMFBSONBOJGPMEJNQPSU54/& JNQPSUNBUQMPUMJCQZQMPUBTQMU UTOF54/& QFSQMFYJUZO@DPNQPOFOUT JOJU QDB O@JUFSNFUIPE FYBDU QMPU@POMZ MPX@EJN@FNCTUTOFGJU@USBOTGPSN GJOBM@FNCFEEJOHT SFWFSTF@EJDUJPOBSZNBQTDPEFT JOUFHFSTUPXPSET TUSJOHT MBCFMT QMUGJHVSF GJHTJ[F JOJODIFT GPSJMBCFMJOFOVNFSBUF MBCFMT YZMPX@EJN@FNCT QMUTDBUUFS YZ QMUBOOPUBUF MBCFM YZ YZ YZUFYU UFYUDPPSET PGGTFUQPJOUT IB SJHIU WB CPUUPN
[ 137 ]
Inthefolwingur,wevisualzthWord2Vecembding,andfithwors simlarengcoth:
[ 138 ]
Using existing pre-trained Word2Vec embeddings Inthisecon,weilbgonthrufpcs: Word2VecfromGoogleNews Usingthepr-trainedWord2Vecembdings
TheWord2VecmodeltrainbyGooglenthGoogleNewsdatehfur dimensof300.Thenumbrofatsicdhypw can,andperhsoul,exprimntwhyoualcsg yieldsthbru . Inthispreandmol,sometpwrducha others uc a forexampl,both
, ,
,and and
Youcanfidopesrtlh
,and
arebingxclud,but
areinclud.Someispldworancu, thelariscon. IUUQTHJUIVCDPNDISJTKNDDPSNJDLJOTQFDU@
XPSEWFDtoinspechwrdmbg-trainedmol.
Inthisecon,weilbrfyxpanhotus-trainedvcos.Beforeadingths section,downlaWord2Vecpre-trainedvcosfm
IUUQTESJWFHPPHMFDPNGJMF
E#9L$XQ*,%:/M/655M44Q2N.FEJU,andlothem: GSPNHFOTJNNPEFMTJNQPSU,FZFE7FDUPST -PBEQSFUSBJOFENPEFM NPEFM,FZFE7FDUPSTMPBE@XPSEWFD@GPSNBU
(PPHMF/FXTWFDUPSTOFHBUJWFCJO CJOBSZ5SVF
Then,wefindthop
wordsthaeiml
XPNBOand
LJOH,butdismlaro
NPEFMXWNPTU@TJNJMBS
QPTJUJWF< XPNBO LJOH >OFHBUJWF< NBO >UPQO
Wesethfolwing: < V RVFFO V NPOBSDI V QSJODFTT V DSPXO@QSJODF V QSJODF >
[ 139 ]
NBO:
Thismaken,since sharetibuw
RVFFOshareimltbuo NBO.
XPNBOand
LJOH,butdoesn
Understanding GloVe GloVeisanuprvdlgothmfbc (embedings)forwds.Ithasbentwimlrgyp,the embdingsratuhwopfvyl downstreamNLPtask. ThedifrncsathWord2Vecisapredtvmol,whiclearnstmbdgo improvethdcablynzgs,thais,thelos(target ;word|cntexs networkaduspimzchSGDtoupdaehnwrk.
).InWord2Vec,it
sbenformalizd-forwadneul
Ontheorand,GloVeisntalycou-basedmol,wheraco-ocurenmatix isfrtbul.Eachentryiso-ocurenspdthfqywag word(rows),athesmi,thawesconxrd(thecolumns).Then,this matrixsfcozedylw-dimensoal(wordxfeatus)matrix,wheraco nowyieldsavctrpfh.Thiswhertdmnouc flavorcmesin,asthegolimnzrcudfwdimensoalrptxfhvcg-dimensoalt. TherasombnfithugGloVeovrWord2Vec,asiteroplzh implentaosuchdbr. Theramnysoucli.ForimplentaousgTensorFlow,onecalk ta
IUUQTHJUIVCDPN(SBEZ4JNPOUFOTPSGMPXHMPWFCMPCNBTUFS(FUUJOH4UBSUFE JQZOC
[ 140 ]
FastText FastText( IUUQTGBTUUFYUDD)isalbryfoecntgwdp andsetclifo.ThemaindvtgofFastTextmbdingsovrWord2Vecis toakeincuhrlsfwdgp , whicouldbevrysfmpgan,andlsofrwth ocuraely . ThemaindfrcbtwWord2VecandFastTextishaforWord2Vec,theaomic entiysachword,whicstemalunor.Ontheconray,inFastText,the smaletunichr-lev charte -grams.Forexampl,thewordvcf threandmxiusz,canbedompst:
-grams,andechworistbgmpf withan
-gramofinusze