Anatomy of Deep Learning Principles: Writing a Deep Learning Library from Scratch

This book introduces the basic principles and implementation process of Deep Learning in a simple way, and uses Python&#

481 232 22MB

English Pages 664 Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Anatomy of Deep Learning Principles: Writing a Deep Learning Library from Scratch

Table of contents :
Chapter 1 Programming and Math Fundamentals

1.1 Python quick start

1.1.1 Python installation

Python interpreter installation

jupyter notebook programming environment

Anaconda installation tool

1.1.2 Object, print() function, type conversion, comment, variable, input() function

1. Objects

2. Print function print()

3. Type conversion

4. Notes

5. Variables

6. input() function

1.1.3 Operation

[subscript operator []](#subscript-operator-)

String formatting

1.1.4 Control Statements

1. if statement

2. while statement

3. for statement

1.1.5 Python commonly used container types

1. list (list)

index

slice

for traverse all elements

2. tuple (tuple)

3. set (collection)

4. dict (dictionary)

1.1.6 Functions

math math package

Global and local variables

Anonymous/Lambda Function (anonymous/lambda function)

Nested functions, closures

yield and generators

1.1.7 Classes and Objects

1.1.8 Getting Started with Matplotlib

subplot()

Axes objects

mplot3d

display image

1.2 tensor library numpy

1.2.1 What is a tensor?

1 vector

The norm of the vector

2 Matrix

3 dimensional tensor

1.2.2 Create ndarray object

1. array()

2. Multidimensional array type ndarray

3. asarray()

4. The tolist() method of ndarray

5. astype() and reshape()

6. arange() and linspace()

7. full(), empty(), zeros(), ones(), eye()

8. Common functions for creating tensors of random values

9. Add, Repeat & Lay, Merge & Split, Edge Fill, Add Axis & Swap Axis

Repeat repeat()

laying tile()

merge concatenate()

overlay stack()

column_stack(), hstack(), vstack()

split split()

Edge Padding

Add Axis

Swap axes

1.2.3 Indexing and slicing of ndarry arrays

1.2.4 Tensor calculation

1. Element-by-element calculation

Hadamard Product

2. Cumulative calculation

3. Dot Product

4 Broadcast Broadcasting

1.3 Calculus

1.3.1 Functions

1.3.2 Four arithmetic and compound operations

Arithmetic

Composite

1.3.3 Limits, derivatives

1. The limit of the sequence

2. Limit and continuity of function

3. Derivatives of functions

1.3.4 The Four Arithmetic Operations of Derivatives and the Chain Derivation Rule

1.3.5 Calculation graph, forward calculation, backpropagation derivation

1.3.6 Partial derivatives and gradients of multivariable functions

1.3.7 Derivative of vector-valued function and Jacobian matrix

1.3.8 Integral

1.4 Probability Basics

1.4.1 Probability

1.4.2 Conditional probability, joint probability, total probability formula, Bayesian formula

1.4.3 Random variables

1.4.4 Probability distribution sequence of discrete random variables

1.4.5 Probability Density of Continuous Random Variables

1.4.6 Distribution functions of random variables

1.4.7 Expectation, variance, covariance, covariance matrix

1. Mean and Expectation

2. Variance, standard deviation

3. Covariance, covariance matrix

Chapter 2 Gradient descent method

2.1 Necessary conditions for function extremum

2.2 Gradient descent method (gradient descent)

2.3 Parameter optimization strategy of gradient descent method

2.3.1 Momentum momentum method

2.3.2 Adagrad method

2.3.3 Adadelta method

2.3.4 RMSprop method

2.3.5 Adam method

2.4 Gradient verification

2.4.1 Comparing numerical and analytical gradients

2.4.2 Generic numerical gradients

2.5 Separation gradient descent algorithm and parameter optimization strategy

2.5.1 Parameter optimizer

2.5.2 Gradient descent method accepting parameter optimizer

Chapter 3 Linear Regression, Logistic Regression and Softmax Regression

3.1 Linear regression

3.1.1 Dining car profit problem

3.1.2 Machine Learning and Artificial Intelligence

1. Machine Learning

2. The relationship between machine learning and artificial intelligence

3. Classification of machine learning

3.1.3 What is linear regression?

3.1.4 Normal equations to solve linear regression problems

3.1.5 Gradient descent method to solve linear regression problems

3.1.6 Debug learning rate

3.1.7 Gradient verification

3.1.8 Prediction

3.1.9 Linear regression with multiple features

1. Multi-feature linear regression

2. Fitting plane

3. Temperature and pressure problems

3.1.10 Normalization of data

3.2 Evaluation of the model

3.2.1 Underfitting and overfitting

3.2.2 Verification set, test set

3.2.3 Learning Curve

3.2.4 Forecasting the output of the dam

3.2.5 Bias and variance (Bias-Variance)

3.3 Regularization
- The loss function of adding the regular term becomes

3.5 Logistic regression

3.5.1 Logistic regression

3.5.2 numpy implementation of logistic regression

1. Generate data

2. Code implementation of gradient descent method

3. Calculate the loss function value

4. Decision curve

5. Prediction accuracy

6. Logistic Regression with Scikit-Learn Library

3.5.3 Actual combat: numpy implementation of iris classification

3.6 softmax regression

3.6.1 spiral data set

3.6.2 softmax function

3.6.3 softmax regression

Multi-sample form

3.6.4 Multi-classification cross-entropy loss

3.6.5 Calculate cross entropy loss by weighted sum

3.6.6 Gradient calculation of softmax regression

1. The gradient of the cross-entropy loss on the weighted sum

2. The gradient of the cross-entropy loss with respect to the weight parameter

3.6.7 Implementation of gradient descent method for softmax regression

2.6.8 Softmax regression of spiral data set

3.7 Batch Gradient Descent and Stochastic Gradient Descent

3.7.1 MNIST handwritten digit set

3.7.2 Training logistic regression with partial training samples

3.7.3 Batch Gradient Descent Method and Implementation

Softmax regression of Fasion MNIST training set

3.7.4 Stochastic Gradient Descent

Summarize

Chapter 4 Neural Networks

4.1 Neural Network

4.1.1 Perceptrons and neurons

1. Perceptron

2. Neurons

4.1.2 Activation function

1. Step function sign(x)

2. Tanh function

4. ReLU function

4.1.3 Neural Networks and Deep Learning

4.1.4 Forward calculation of multiple samples

4.1.5 Output

4.1.6 Loss function

1. Mean square error loss

2. Binary classification cross entropy loss

3. Multi-classification cross-entropy loss

4.1.7 Neural Network Training Based on Numerical Gradients

4.1.8 Deep Learning

4.2 Reverse derivation

4.2.1 Forward calculation and reverse derivation

4.2.2 Computation graph

4.2.3 The gradient of the loss function with respect to the output

1. The gradient of the binary cross-entropy loss function on the output

2. The gradient of the mean square error loss function on the output

3. The gradient of the multi-class cross entropy loss function on the output

4.2.4 Derivation of back propagation of 2-layer neural network

1. Reverse derivation of single sample

2. Multi-sample vectorized representation of reverse derivation

3. Gradient calculation formula in column vector form

4.2.5 Python implementation of 2-layer neural network

4.2.6 Derivation of backpropagation of any layer neural network

4.3 Implement a simple deep learning framework

4.3.1 Training process of neural network

4.3.2 Code implementation of the network layer

4.3.3 Gradient test of network layer

4.3.4 Neural Network Class

4.3.5 Gradient test of neural network

4.3.6 MNIST data handwritten digit recognition based on deep learning framework

4.3.7 Improved general neural network framework: separate weighted sum and activation function

Gradient Validation

4.3.8 Independent parameter optimizer

4.3.9 fashion-mnist classification training

4.3.9 Read and write model parameters

Chapter 5 Basic Techniques for Improving Neural Network Performance

5.1 Data processing

5.1.1 Data Augmentation

5.1.2 Normalization

5.1.3 Feature Engineering

1. Data dimensionality reduction and principal component analysis

2 Whitening

5.2 Parameter debugging

5.2.1 Weight initialization

5.2.2 Optimization parameters

5.3 Batch Normalization

5.3.1 What is batch normalization?

5.3.2 Reverse derivation of batch normalization

5.3.3 Code Implementation of Batch Normalization

5.4 Regularization Regularization

5.4.1 Weight regularization

5.4.2 Dropout

5.4.3 Early stopping method (Early stopping)

Chapter 6 Convolutional Neural Network CNN

6.1 Convolution

6.1.1 What is convolution?

span

6.1.2 Convolution of one-dimensional signal

6.1.3 Two-dimensional convolution

span

6.1.4 Multiple input channels and multiple output channels

6.1.5 Pooling

6.2 Convolutional Neural Network

6.2.1 Fully connected neurons and convolutional neurons

6.2.2 Convolutional Layer and Convolutional Neural Network

6.2.3 Reverse derivation and code implementation of convolutional layer and pooling layer

Reverse derivation of convolutional layer

The reverse derivation of the pooling layer

6.2.4 Implementation of convolutional neural network

6.3 Convolution matrix multiplication

6.3.1 Matrix multiplication of 1D sample convolution

6.3.2 Matrix multiplication of 2D sample convolution

6.3.3 Matrix multiplication for reverse derivation of 1D convolution

6.3.4 Matrix multiplication for reverse derivation of 2D convolution

6.4 Fast convolution based on coordinate index

Gradient Test

Time comparison with non-accelerated convolution

6.5 Typical convolutional neural network structure

6.5.1 LeNet-5

6.5.2 AlexNet

6.5.3 VGG

6.5.4 Gradient Explosion and Vanishing Problems of Deep Neural Networks

6.5.5 Residual Networks (ResNets)

6.5.6 Google Inception Network

6.5.7 Network in Network (NiN)

Chapter 7 Recurrent Neural Network RNN

7.1 Sequence problems and models

7.1.1 Stock Price Prediction Problem

7.1.2 Probabilistic sequence model, language model

1. Probabilistic sequence model

2. Language Model

7.1.3 Autoregressive model

7.1.4 Generate autoregressive data

7.1.5 Time window method

7.1.6 Time window sampling

7.1.7 Time window method modeling and training

7.1.8 Long-term forecast and short-term forecast

7.1.9 Stock Price Prediction

7.1.10 k-gram language model

7.2 Recurrent Neural Networks

7.2.1 Acyclic neural network without memory function

7.2.2 Recurrent neural network with memory function

7.3 Backpropagation through time

7.4 Implementation of single-layer recurrent neural network

7.4.1 Initialize model parameters

7.4.2 Forward calculation

7.4.3 Loss function

7.4.4 Reverse derivation

7.4.5 Gradient verification

7.4.6 Gradient descent training

7.4.7 Sampling of sequence data

7.4.8 RNN training and prediction of sequence data

Training on sequence data

predict

Training and prediction of stock data

7.5 RNN language model and text generation

7.5.1 Character table

7.5.2 Sampling of character sequence samples

7.5.3 RNN model training and prediction

predict

7.6 Gradient explosion and gradient disappearance of RNN network

7.7 Long Short-Term Memory Network (LSTM)

7.7.1 LSTM neuron: cell

7.7.2 Reverse derivation of LSTM

7.7.3 LSTM code implementation

Gradient Test

Text generation

predict

7.7.4 Variations of LSTM

7.8 Gated Recurrent Unit (GRU)

7.8.1 Working principle of GRU

7.8.2 GRU code implementation

7.9 Class Representation and Implementation of Recurrent Neural Network

7.9.1 Implementing Recurrent Neural Networks with Classes

7.9.2 Class implementation of recurrent neural network unit

7.10 Multilayer, Bidirectional Recurrent Neural Network

7.10.1 Multilayer Recurrent Neural Network

7.10.2 Training and prediction of multi-layer recurrent neural network

7.10.3 Bidirectional Recurrent Neural Network

7.11 Sequence to sequence (seq2seq) model

machine translation

7.11.1 Implementation of Seq2Seq model

7.11.2 Seq2Seq for character-level machine translation

1. Character word list

2. Read training samples and build character vocabulary

3. Training character-level Seq2Seq model

7.11.3 Seq2Seq machine translation based on Word2Vec

1. Word vectorization Word2Vec's skip-gram method

7.11.4 Seq2Seq model based on word embedding layer

1. Word embedding layer

2. Seq2Seq model using word embedding layer

7.11.5 Attention mechanism

Chapter 8 Generating Models

8.1 Generate model

8.2 Autoencoders

8.2.1 Autoencoder

8.2.2 Sparse Encoder

8.2.3 Implementation of Autoencoder

8.3 Variational Autoencoders

8.3.1 What is a variational autoencoder?

8.3.2 Loss function

8.3.3 Parameter resampling

8.3.4 Reverse Derivation

8.3.4 Implementation of Variational Autoencoder

8.4 Generating Adversarial Networks

8.4.1 Principle of GAN

1. Discriminator and Generator

2. Loss function

3. Training process

8.4.2 Code implementation of GAN training process

8.5 GAN modeling example

8.5.1 GAN modeling of a set of real numbers

1. Real data: a set of real numbers

2. Define discriminator and generator functions

3. Real data iterator, noise data iterator

4. Intermediate result drawing function

5. Training GAN

8.5.2 GAN modeling of two-dimensional coordinate points

1. Real data: coordinate points sampled on the elliptic curve

2. Real data iterator, noise iterator

3. Define the generator and discriminator of the GAN model

4. Training GAN model

8.5.3 GAN modeling of MNIST dataset

1. Read training data

2. Define the data iterator

3. Define the generator and discriminator and its optimizer

4. Training model

8.5.4 GAN training techniques

8.6 GAN loss function and its probability explanation

8.6.1 The global optimal solution of the loss function of GAN

8.6.2 Kullback–Leibler divergence and Jensen–Shannon divergence

8.6.3 Maximum Likelihood Interpretation of GAN

8.7 Improved loss function: Wasserstein GAN (WGAN)

8.7.1 Principle of Wasserstein GAN

8.7.2 WGAN code implementation

8.8 Deep convolutional confrontation network DCGAN

8.8.1 Transposed convolution of 1D vectors

8.8.2 2D transposed convolution

8.8.3 Implementation of convolutional confrontation network DCGAN

Polecaj historie