Mastering machine learning with Python in six steps: a practical implementation guide to predictive data analytics using Python 9781484228654, 9781484228661, 1484228650, 1484228669

Master machine learning with Python in six steps and explore fundamental to advanced topics, all designed to make you a

1,260 253 10MB

English Pages XXI, 358 str.: ilustr.; 24 cm [374] Year 2017

Report DMCA / Copyright

DOWNLOAD FILE

Mastering machine learning with Python in six steps: a practical implementation guide to predictive data analytics using Python
 9781484228654, 9781484228661, 1484228650, 1484228669

Table of contents :
Contents at a Glance......Page 4
Contents......Page 5
About the Author......Page 13
About the Technical Reviewer......Page 14
Acknowledgments......Page 15
Introduction......Page 16
The Best Things in Life Are Free......Page 19
The Rising Star......Page 20
Python 2.7.x or Python 3.4.x?......Page 21
Python from Official Website......Page 22
Python Identifiers......Page 23
Code Blocks (Indentation & Suites)......Page 24
Suites......Page 25
Basic Object Types......Page 26
Comments in Python......Page 28
Multiple Statements on a Single Line......Page 29
Arithmetic Operators......Page 30
Comparison or Relational Operators......Page 31
Assignment Operators......Page 33
Bitwise Operators......Page 34
Membership Operators......Page 36
Identity Operators......Page 37
Selection......Page 38
Iteration......Page 39
Lists......Page 40
Tuple......Page 44
Sets......Page 47
Changing a Set in Python......Page 51
Set Union......Page 52
Set Symmetric Difference......Page 53
Basic Operations......Page 54
Dictionary......Page 55
Defining a Function......Page 60
Scope of Variables......Page 61
Variable Length Arguments......Page 62
Module......Page 63
Opening a File......Page 65
Exception Handling......Page 66
Endnotes......Page 70
Chapter 2: Step 2 – Introduction to Machine Learning......Page 71
History and Evolution......Page 72
Artificial Intelligence Evolution......Page 75
Statistics......Page 76
Bayesian......Page 77
Regression......Page 78
Descriptive Analytics......Page 80
Data Analytics......Page 79
Predictive Analytics......Page 81
Data Science......Page 82
Statistics vs. Data Mining vs. Data Analytics vs. Data Science......Page 84
2) Classification......Page 85
Anomaly Detection......Page 86
Knowledge Discovery Databases (KDD)......Page 87
Preprocessing......Page 88
Cross-Industry Standard Process for Data Mining......Page 89
Phase 6: Deployment......Page 91
Assess......Page 92
KDD vs. CRISP-DM vs. SEMMA......Page 93
Data Analysis Packages......Page 94
Array......Page 95
Creating NumPy Array......Page 96
Field Access......Page 98
Basic Slicing......Page 99
Advanced Indexing......Page 101
Array Math......Page 102
Broadcasting......Page 105
Data Structures......Page 107
Reading and Writing Data......Page 108
Basic Statistics Summary......Page 109
Viewing Data......Page 110
Basic Operations......Page 112
Merge/Join......Page 113
Join......Page 115
Grouping......Page 116
Using Global Functions......Page 118
Customizing Labels......Page 120
Line Plots – Using ax.plot()......Page 121
Multiple Lines on Same Axis......Page 122
Multiple Lines on Different Axis......Page 123
Control the Line Style and Marker Style......Page 124
Line Style Reference......Page 125
Colomaps Reference......Page 126
Bar Plots – using ax.bar() and ax.barh()......Page 127
Horizontal Bar Charts......Page 128
Stacked Bar Example Code......Page 129
Pie Chart – Using ax.pie()......Page 130
Example Code for Grid Creation......Page 131
Machine Learning Core Libraries......Page 132
Endnotes......Page 134
Machine Learning Perspective of Data......Page 135
Nominal Scale of Measurement......Page 136
Ratio Scale of Measurement......Page 137
Feature Engineering......Page 138
Handling Categorical Data......Page 139
Normalizing Data......Page 141
Exploratory Data Analysis (EDA)......Page 143
Univariate Analysis......Page 144
Multivariate Analysis......Page 146
Correlation Matrix......Page 147
Pair Plot......Page 148
Supervised Learning– Regression......Page 149
Correlation and Causation......Page 151
Fitting a Slope......Page 152
R-Squared for Goodness of Fit......Page 154
Mean Absolute Error......Page 156
Polynomial Regression......Page 157
Multivariate Regression......Page 161
Multicollinearity and Variation Inflation Factor (VIF)......Page 163
Interpreting the OLS Regression Results......Page 167
Regression Diagnosis......Page 170
Outliers......Page 171
Homoscedasticity and Normality......Page 172
Over-fitting and Under-fitting......Page 173
Regularization......Page 174
Nonlinear Regression......Page 177
Supervised Learning – Classification......Page 178
Logistic Regression......Page 179
Evaluating a Classification Model Performance......Page 182
ROC Curve......Page 184
Fitting Line......Page 185
Stochastic Gradient Descent......Page 186
Regularization......Page 187
Multiclass Logistic Regression......Page 189
Training Logistic Regression Model and Evaluating......Page 190
Generalized Linear Models......Page 191
Supervised Learning – Process Flow......Page 193
Decision Trees......Page 194
How the Tree Splits and Grows?......Page 195
Conditions for Stopping Partitioning......Page 196
Support Vector Machine (SVM)......Page 198
Key Parameters......Page 199
k Nearest Neighbors (kNN)......Page 201
Components of Time Series......Page 203
Autoregressive Integrated Moving Average (ARIMA)......Page 204
Running ARIMA Model......Page 205
Checking for Stationary......Page 206
Autocorrelation Test......Page 207
Build Model and Evaluate......Page 208
Predicting the Future Values......Page 211
Unsupervised Learning Process Flow......Page 212
K-means......Page 213
Limitations of K-means......Page 214
Elbow Method......Page 217
Average Silhouette Method......Page 219
Key Parameters......Page 221
Principal Component Analysis (PCA)......Page 223
Endnotes......Page 226
Optimal Probability Cutoff Point......Page 227
Rare Event or Imbalanced Dataset......Page 231
Known Disadvantages......Page 234
Which Resampling Technique Is the Best?......Page 235
Variance......Page 236
K-Fold Cross-Validation......Page 237
Ensemble Methods......Page 239
Bagging......Page 240
Feature Importance......Page 242
Extremely Randomized Trees (ExtraTree)......Page 243
How Does the Decision Boundary Look?......Page 244
Boosting......Page 246
Example Illustration for AdaBoost......Page 247
Boosting Iteration 3......Page 248
Final Model......Page 249
Gradient Boosting......Page 251
Boosting – Essential Tuning Parameters......Page 253
Xgboost (eXtreme Gradient Boosting)......Page 254
Ensemble Voting – Machine Learning’s Biggest Heroes United......Page 258
Hard Voting vs. Soft Voting......Page 260
Stacking......Page 262
Hyperparameter Tuning......Page 264
GridSearch......Page 265
RandomSearch......Page 266
Endnotes......Page 268
Chapter 5: Step 5 – Text Mining and Recommender Systems......Page 269
Text Mining Process Overview......Page 270
Data Assemble (Text)......Page 271
Step 2 – Fetching Tweets......Page 273
Sentence Tokenizing......Page 277
Removing Noise......Page 278
Part of Speech (PoS) Tagging......Page 280
Stemming......Page 281
Lemmatization......Page 283
N-grams......Page 285
Bag of Words (BoW)......Page 286
Term Frequency-Inverse Document Frequency (TF-IDF)......Page 288
Frequency Chart......Page 290
Word Cloud......Page 291
Lexical Dispersion Plot......Page 292
Co-occurrence Matrix......Page 293
Outline Placeholder......Page 294
Text Similarity......Page 295
Text Clustering......Page 297
Latent Semantic Analysis (LSA)......Page 298
Latent Dirichlet Allocation (LDA)......Page 300
Text Classification......Page 302
Sentiment Analysis......Page 304
Deep Natural Language Processing (DNLP)......Page 305
Word2Vec......Page 307
Recommender Systems......Page 309
Collaborative Filtering (CF)......Page 310
Endnotes......Page 313
Chapter 6: Step 6 – Deep and Reinforcement Learning......Page 314
Artificial Neural Network (ANN)......Page 315
What Goes Behind, When Computers Look at an Image?......Page 316
Perceptron – Single Artificial Neuron......Page 317
Multilayer Perceptrons (Feedforward Neural Network)......Page 320
Load MNIST Data......Page 321
Key Parameters for scikit-learn MLP......Page 322
Restricted Boltzman Machines (RBM)......Page 324
MLP Using Keras......Page 329
Autoencoders......Page 332
Dimension Reduction Using Autoencoder......Page 333
De-noise Image Using Autoencoder......Page 336
Convolution Neural Network (CNN)......Page 337
CNN on CIFAR10 Dataset......Page 338
CNN on MNIST Dataset......Page 344
Visualization of Layers......Page 348
Recurrent Neural Network (RNN)......Page 349
Long Short-Term Memory (LSTM)......Page 350
Transfer Learning......Page 353
Reinforcement Learning......Page 357
Endnotes......Page 361
Summary......Page 362
Tips......Page 363
Don’t Reinvent the Wheels from Scratch......Page 364
Start with Simple Models......Page 365
Happy Machine Learning......Page 366
Index......Page 367

Polecaj historie